Player/Player interactions are always trouble in multiplayer games.
The way to get them “perfect,” is to use the input-synchronous, delayed-command pattern. Every command has to go to the server first, and is only executed once it comes back from the server. That way, all commands can be executed in the same way on all clients.
Ideally you do this with a deterministic, constant step size simulation engine, too.
The draw-back of that model, is that you have to live with a server round-trip time for every input. This is totally acceptable for RTS games, but not so good for twitch games.
In your game, you have to make up your mind on who is “authority” of a “push” operation. Either the pushed, or the pusher, will be “authority” and their position will “win.” The other side will very likely see a correction, because the time of contact doesn’t happen the same in the delayed frame, as it does in the original frame. Note that two characters colliding may have different opinion about who is the “pusher!”
If you want it to look similar for everybody, perhaps one option is to actually allow pawn/pawn overlap (so physics doesn’t push,) and instead detect pawn/pawn overlap, and send a command that separates the two pawns with a pre-determined position animation/vector. Because this command will be executed with the same amount of separation over the same amount of time on both clients, it will look more similar than a physics-based collision solution. You’ll typically want to manage this as bidirectional – “player A and B are separating” – with no difference in ordering, rather than trying to implement “A pushes B so A stands still” because B will see something different on their side.
Of course, you will still run into cases where player 1 runs ahead of player 2 on their own client, and sees themselves pushing player 2, whereas player 2 runs ahead of player 1 on their client, and thinks that player 1 misses. If player 2 starts being pushed there, they might be surprised at that outcome. This is not a “physics” problem, this is a “speed of light” problem – people will have separate world views because of latency, and separate (split) worldviews cannot possibly be reconciled.