Character Movement Component's Movement Mode is not always replicating (fails 1 in 10 times) in a dedicated server setting

This is in a project using Lyra as a template: a 3rd person shooter.
I haven’t touched anything related to the Character Movement Component.
I noticed that I could sometimes see a remote client’s character jump and never land.
I put a bunch of logs in the animation blueprint and found that sometimes the Movement Mode gets set to Falling but never back to Walking.
The client locally sees themselves jump and land normally, but other clients might see them jump and not land.
If there are 4 players in a game, 2 might see them jump and land normally while the other 2 might see them jump and never land. Which ‘2’ see them stay afloat varies from jump to jump.
Lyra doesn’t alter anything related to the Movement Mode and neither do I.
The only difference is that I changed the gravity I use. I don’t change it dynamically, I just changed it globally.
I made a level with different types of walkable surfaces and found that it doesn’t seem to happen on landscape. Only on walkable actors.
Has anyone else experienced this?
Does anyone know of a good mitigation for this issue?

EDIT: Also wanted to mention that this issue doesn’t happen playing locally, even across computers in the same network. It seems to only happen with a public cloud dedicated server.

Enable Notify Apex in CMC and print string off the Event NotifyJumpApex.
Also do a print string via Event On Landed.

These two will help you debug further.

My suspicion is the issue lies in the animgraph SM transition rules. Somethings not getting set.

I put all the debugging info and tracked it down to the MovementMode not changing. The transition in the SM simply says when MovementMode != Falling. It used the PropertyAccess to check in a thread safe manner, and I tried changing it to checking a cached movement mode I set in the animation blueprint event to no avail. I put some code in the animation blueprint event to signal when MovementMode != Falling at any point, and when the issue happens, that code never fires.

Can you get a vid of it happening? Would also be beneficial if the capsule component wasn’t hidden.

Unfortunately, I can’t really post our game at this point.
I put a check on the Character blueprint (which for Lyra it’s B_HeroShooterMannequin), for anytime MovementMode changes to Log it. Indeed, when this issue happens, the remote character never lands - MovementMode never changes from ‘Falling’ - so it’s not something in the animation blueprint. MovementMode is not replicating sometimes (about 1 in 10 times).
I guess for now I’ll replicate it by brute force. The next step would be to test with vanilla Lyra but setting up a fully working dedicated server version of Lyra is a lot of work so I might not get to it. If no one has heard of this it could be something we changed, but it’s hard to tell why that would be since Movement Mode is deep in the Engine and we haven’t touched Engine at all. Seems like the Engine is not reliably replicating it. A lost packet is all it would take for any one-time event to fail on remote clients.

Another update here. I brute forced the server to reliably replicate MovementMode on clients and the problem went away.
I’m surprised I don’t see more complaints about this. I’m in Cali and the dedicated server is in Chicago. Perhaps that’s a bit unusual for the average use case? In any case variables that have discrete states like the MovementMode rather than continuously changing values like an actor’s location should be replicated reliably.

Looking into this further. I turn on all engine logs and noticed this message on the client when the issue happens:

B_Hero_ShooterMannequin_C_2147467115’ is not resolved on client, skipping SimulateMovement

Does anyone have a clue as to why a client would intermittently get that message?

This isn’t a common issue. I’ve never seen it happen with CMC.

Netcull distance not being large enough. The default is roughly 150m.
Bandwidth and Network Saturation?

After some more digging, it looks like even though the dedicated server does always receive the movements from the owning client, the server doesn’t always pass on the changes to the other clients. When the owning client is standing on Landscape terrain, the server always broadcasts the changes, but when the owning client is standing on a mesh actor, it doesn’t always do it.

Does it fail if you walk on a simple cube actor or geometry brush?