Why is this tick movement framerate dependent?

Context: This class takes in a TArray of structs that contains ISM component info and moves an ISM along a flow field. This tick function will eventually have to handle 100 ISM components & 1000s of instances. I thought that maybe the calculations within the tick are just too heavy, but even with 1 instance I can see the lower FPS client falling behind.

UnitMasterCPP.cpp (3.9 KB)
UnitMasterCPP.h (1.1 KB)

Any help would be appreciated.

Here is my grid class and struct/enums as well for testing.
Grid.h (5.7 KB)
MyCustomEnums.h (1.0 KB)
MyCustomStructs.h (5.9 KB)
Grid.cpp (24.0 KB)
MyCustomEnums.cpp (112 Bytes)
MyCustomStructs.cpp (114 Bytes)

Summoning the framerate independent wizard @Chatouille =D

@LotionMyHotBod

in UnitMasterCPP.cpp

  InstanceData.ISMComponent->UpdateInstanceTransform(InstanceData.Index, InstanceTransform, true, true, true);
InstanceData.ISMComponent->MarkRenderStateDirty();

is redundant, second passed in bool (true) is bMarkRenderStateDirty which in a later if statement calls MarkRenderStateDirty();

So at least part of a wasted cpu cycle. Looking further into optimizations.

Perhaps BatchUpdateInstancesTransforms would be a better fit for these operations?

bool UInstancedStaticMeshComponent::BatchUpdateInstancesTransforms(int32 StartInstanceIndex, const TArray<FTransform>& NewInstancesTransforms, const TArray<FTransform>& NewInstancesPrevTransforms, bool bWorldSpace, bool bMarkRenderStateDirty, bool bTeleport)

Perhaps cache the trasform positions into a temp TArray in the loop, skip the update inside the loop (costly) and then once the loop is done pass the collected transforms on to BatchUpdateInstancesTransforms

2 Likes

Hi,

You could subgroup it so you have a map of ISMCs as Keys and transforms (and any other data) as Values and then enumerate through each ISMCs instances at once, only dirtying at the end.

Also, rather than get instance transform, if youā€™ve already got the ISMC there you can get a reference to the transform data (stored as Matrices) and just grab the location value straight from there rather than convert to transform then gettranslation.

TArray<FInstancedStaticMeshInstanceData>& instData=ismc->PerInstanceSMData;
FVector iloc;
for(int32 i=0;i<numInst;i++) {

	FMatrix& t=instData[i].Transform;
	iloc.X=t.M[3][0];
	iloc.Y=t.M[3][1];
	iloc.Z=t.M[3][2];

1 Like

Thanks for taking the time guys @3dRaven @RecourseDesign. Iā€™ll give these things a try! So it sounds like youā€™re both saying that thereā€™s no specific functions Iā€™m invoking that are innately framerate dependent? Itā€™s likely just really heavy code?

1 Like

Hi, I have only looked at AUnitMasterCPP::Tick(), and the/a thing that caught my eye is that you are not handling the ā€˜left overā€™ movement in the tick a waypoint is reached. So that may explain your problem, on the faster machine with smaller ticks the next waypoint is reached and so the following tick is free to start moving on to the next one, whereas on the slower machine one tick didnā€™t quite get us there, so we need to complete the move in the next tick (and throw away the remaining tick time).

So the faster machine has more granular ticks, and therefore has less of this wasted remaining time.

2 Likes

the UpdateInstanceTransform function is slow when you call it per instance like that, check out BatchUpdateInstancesTransforms where you pass in an array of transforms - it only recreates physics, navigation etc once.

1 Like

@3dRaven @RecourseDesign @silnarm

Hey guys! So I took all of your advice by adjusting for the ā€˜left overā€™ time & batch updating instances. This has fixed the frame rate dependency! So Thank you so much.

However, this has introduced an issue that I cannot resolve. Now, periodically instances will freeze/stop moving. This is occurring more frequently on the faster FPS client. Iā€™m pretty sure it is because of my ā€˜left overā€™ logic. When I have it in it fixes my framerate issue, but any adjustments I make to it introduce this freezing inconsistency. Iā€™ve been messing with this for hours since we last spoke! Any suggestions?

I went through and commented to make it easier.
UnitMasterCPP.cpp (6.3 KB)
UnitMasterCPP.h (1.1 KB)

Check if recording an insights session might help you pin down the spikes causing the freezes.

1 Like

I may be misunderstanding you - The issue isnā€™t with my FPS spiking/freezing. Itā€™s more just that units randomly stop moving. I canā€™t figure out a way to test it because its only occurring during stress testing with 1000s of instances. I donā€™t know how to tie a message to the random ones that stop

Perhaps the pure calculations could be moved to an FRunnable and the main thread tick would only look up the transform outcomes in tick?
It would add in an extra level of complexity but perhaps it would reduce the stops.

How do your main CPU coreā€™s look during these stops? Are they stressed at a 100%?
Offloading some calculations to an extra thread might prevent this if that is the case.

If it is maybe a result of culling then setting SetCullDistances to a higher end value might fix it? (on the instance component)

1 Like

Sorry, let me clarify some things. I think I misled you. My ā€œstress testingā€ is just having 1000s of instances moving at once, but the framerate never drops below 100 on the my faster client. The stopping units occurs more frequently on this client. The client running at 10fps actually sees less units stopping.

Are you suggesting with your responses that the logic looks good and I should explore memory/CPU issues?

Hmm when you mention client you mean multiplayer, correct? So the instance component actor replicated in this case.
It might be a matter of network saturation of you are sending out the whole data of every instance. You network is running out of space to send data in the small amounts of time.

In RTSā€™s and unit heavy games you donā€™t replicate the armies movement one by one.
You just send out the specific commands to units / groups of units and the client and server update these in what is called lockstep.

Each player sends their commands => they are gathered => the lockstep ends => all commands are executed at the same time.

Each unit advances on each clients machine deterministically.

1 Like

@3dRaven Yes it is multiplayer. No, Iā€™m actually doing all of the work client side. The way it works is this:

  • Client clicks the hotkey to build a unit
  • Server validates request
  • Server creates actor that replicates with 1 struct property that replicates
  • Server serializes & packages necessary ā€œDirectionsā€ into struct
  • After replication, client handles creating ISM components and instances based on directions.
  • Put actor into unused pool.
  • Client also handles all pathfinding & movement
    • I do have future plans to send periodic position updates to ensure clients stay completely in sync (This is the next big problem to sort through. it doesnt have to be perfect. I will likely space out updates and break large updates up and send over time)

Direction data packages are incredibly lightweight. As of right now, there is very little bandwidth being used.

Utilizing Unreals built-in actor replication process to reliably send instance creation directions has worked really well. I tried to build a system to ā€œreplicateā€ instance creation in the same fashion UE replicates actors, but it was lagging everything to high hell.

This has been working very well. This is why I need the movement logic to be absolutely frame rate independent

I researched determinstic lockstep obsessively to see if thatā€™s the direction I wanted to take, but I dont think its necessary for my project. My project is incredibly predictable already and the only thing i need to ensure is completely in sync is unit positions. If that happens the the rest of the logic can be client side other than damage/unit death (all the noncheating stuff)

Is the ism component set to replicate? Perhaps itā€™s periodically pushing the servers ism structure up to the client causing the stalls?

1 Like

No, I have all created ISM components set to not replicate and Iā€™ve verified that when the instance management blueprint replicates it does not override the client created component or its data. Iā€™m pretty confident that the reason these instance are sticking is because of my ticking movement logic. I just cant figure out where or a good way to debug it. Hereā€™s a video of the issue maybe it will add more context. You can see at the beginning both the low fps client (left) and high fps client (right) reach the goal at the same time, but towards end of video you can see stopped units. In this example not a single unit in the low FPS client stopped.

The screen recording makes it seem like there is movement jitter on high fps client as well, but there isnā€™t on my end.

try running Stat UnitGraph on the client to see where the bottleneck is.
Is it

  • Frame time
  • Game logic
  • RHIT (render hardware interface thread)

Something is defiantly hammering the frame-rate there. The frame time seems very high. Are both client and server using the same logic to update the ism component?

Are you sure the client doesnā€™t use an older version of the update logic transforming each instance in the loop (the old way)

1 Like

The ā€˜Drawā€™ is whatā€™s pushing the frame time up. Both client and server are using the exact same logic to create components, instances and movement.

Maybe I can help more if you can help me understand your line of thinking. Why do you believe this is specifically a performance issue rather than a logic flaw? Then I can do some exploration myself without wasting your time.

So is the left client a lower end computer? Perhaps it just doesnā€™t have the ā€œhorsepowerā€ to update the meshes in time? Draw would indicate that the GPU is struggling with keeping up with the demand.

Does it have lumen, vsm and distance fields turned off?

1 Like

So I have the game running in Standalone with 2 clients. The left client is a listen server and client. Right is the other client. Iā€™m not sure why it works this way, but whichever window I focus gets all of the processing power of my computer and the others FPS tanks. This works well for testing bad performance and good performance syncing though!

I intentionally have graphic settings jacked up to push the performance down as much as possible. I have lumen on with best anti aliasing. When I turn off all of the settings I donā€™t need my FPS flies up to 400FPS on focused window and 150 on other, which makes it harder for me to stress test.

I should note my computer is decently powerful - i have a 3070 in it