Why is this tick movement framerate dependent?

LotionMyHotBod · July 14, 2024, 6:43pm

Context: This class takes in a TArray of structs that contains ISM component info and moves an ISM along a flow field. This tick function will eventually have to handle 100 ISM components & 1000s of instances. I thought that maybe the calculations within the tick are just too heavy, but even with 1 instance I can see the lower FPS client falling behind.

UnitMasterCPP.cpp (3.9 KB)
UnitMasterCPP.h (1.1 KB)

Any help would be appreciated.

LotionMyHotBod · July 14, 2024, 6:44pm

Here is my grid class and struct/enums as well for testing.
Grid.h (5.7 KB)
MyCustomEnums.h (1.0 KB)
MyCustomStructs.h (5.9 KB)
Grid.cpp (24.0 KB)
MyCustomEnums.cpp (112 Bytes)
MyCustomStructs.cpp (114 Bytes)

Summoning the framerate independent wizard @Chatouille =D

3dRaven · July 14, 2024, 7:04pm

@LotionMyHotBod

in UnitMasterCPP.cpp

  InstanceData.ISMComponent->UpdateInstanceTransform(InstanceData.Index, InstanceTransform, true, true, true);
InstanceData.ISMComponent->MarkRenderStateDirty();

is redundant, second passed in bool (true) is bMarkRenderStateDirty which in a later if statement calls MarkRenderStateDirty();

So at least part of a wasted cpu cycle. Looking further into optimizations.

Perhaps BatchUpdateInstancesTransforms would be a better fit for these operations?

bool UInstancedStaticMeshComponent::BatchUpdateInstancesTransforms(int32 StartInstanceIndex, const TArray<FTransform>& NewInstancesTransforms, const TArray<FTransform>& NewInstancesPrevTransforms, bool bWorldSpace, bool bMarkRenderStateDirty, bool bTeleport)

Perhaps cache the trasform positions into a temp TArray in the loop, skip the update inside the loop (costly) and then once the loop is done pass the collected transforms on to BatchUpdateInstancesTransforms

RecourseDesign · July 14, 2024, 7:11pm

Hi,

You could subgroup it so you have a map of ISMCs as Keys and transforms (and any other data) as Values and then enumerate through each ISMCs instances at once, only dirtying at the end.

Also, rather than get instance transform, if you’ve already got the ISMC there you can get a reference to the transform data (stored as Matrices) and just grab the location value straight from there rather than convert to transform then gettranslation.

TArray<FInstancedStaticMeshInstanceData>& instData=ismc->PerInstanceSMData;
FVector iloc;
for(int32 i=0;i<numInst;i++) {

	FMatrix& t=instData[i].Transform;
	iloc.X=t.M[3][0];
	iloc.Y=t.M[3][1];
	iloc.Z=t.M[3][2];

LotionMyHotBod · July 14, 2024, 7:14pm

Thanks for taking the time guys @3dRaven @RecourseDesign. I’ll give these things a try! So it sounds like you’re both saying that there’s no specific functions I’m invoking that are innately framerate dependent? It’s likely just really heavy code?

silnarm · July 14, 2024, 7:27pm

Hi, I have only looked at AUnitMasterCPP::Tick(), and the/a thing that caught my eye is that you are not handling the ‘left over’ movement in the tick a waypoint is reached. So that may explain your problem, on the faster machine with smaller ticks the next waypoint is reached and so the following tick is free to start moving on to the next one, whereas on the slower machine one tick didn’t quite get us there, so we need to complete the move in the next tick (and throw away the remaining tick time).

So the faster machine has more granular ticks, and therefore has less of this wasted remaining time.

RecourseDesign · July 14, 2024, 7:30pm

the UpdateInstanceTransform function is slow when you call it per instance like that, check out BatchUpdateInstancesTransforms where you pass in an array of transforms - it only recreates physics, navigation etc once.

LotionMyHotBod · July 15, 2024, 3:12pm

@3dRaven @RecourseDesign @silnarm

Hey guys! So I took all of your advice by adjusting for the ‘left over’ time & batch updating instances. This has fixed the frame rate dependency! So Thank you so much.

However, this has introduced an issue that I cannot resolve. Now, periodically instances will freeze/stop moving. This is occurring more frequently on the faster FPS client. I’m pretty sure it is because of my ‘left over’ logic. When I have it in it fixes my framerate issue, but any adjustments I make to it introduce this freezing inconsistency. I’ve been messing with this for hours since we last spoke! Any suggestions?

I went through and commented to make it easier.
UnitMasterCPP.cpp (6.3 KB)
UnitMasterCPP.h (1.1 KB)

3dRaven · July 15, 2024, 3:20pm

Check if recording an insights session might help you pin down the spikes causing the freezes.

LotionMyHotBod · July 15, 2024, 3:23pm

I may be misunderstanding you - The issue isn’t with my FPS spiking/freezing. It’s more just that units randomly stop moving. I can’t figure out a way to test it because its only occurring during stress testing with 1000s of instances. I don’t know how to tie a message to the random ones that stop

3dRaven · July 15, 2024, 3:30pm

Perhaps the pure calculations could be moved to an FRunnable and the main thread tick would only look up the transform outcomes in tick?
It would add in an extra level of complexity but perhaps it would reduce the stops.

How do your main CPU core’s look during these stops? Are they stressed at a 100%?
Offloading some calculations to an extra thread might prevent this if that is the case.

If it is maybe a result of culling then setting SetCullDistances to a higher end value might fix it? (on the instance component)

LotionMyHotBod · July 15, 2024, 3:38pm

Sorry, let me clarify some things. I think I misled you. My “stress testing” is just having 1000s of instances moving at once, but the framerate never drops below 100 on the my faster client. The stopping units occurs more frequently on this client. The client running at 10fps actually sees less units stopping.

Are you suggesting with your responses that the logic looks good and I should explore memory/CPU issues?

3dRaven · July 15, 2024, 3:43pm

Hmm when you mention client you mean multiplayer, correct? So the instance component actor replicated in this case.
It might be a matter of network saturation of you are sending out the whole data of every instance. You network is running out of space to send data in the small amounts of time.

In RTS’s and unit heavy games you don’t replicate the armies movement one by one.
You just send out the specific commands to units / groups of units and the client and server update these in what is called lockstep.

Each player sends their commands => they are gathered => the lockstep ends => all commands are executed at the same time.

Each unit advances on each clients machine deterministically.

LotionMyHotBod · July 15, 2024, 3:51pm

@3dRaven Yes it is multiplayer. No, I’m actually doing all of the work client side. The way it works is this:

Client clicks the hotkey to build a unit
Server validates request
Server creates actor that replicates with 1 struct property that replicates
Server serializes & packages necessary “Directions” into struct
After replication, client handles creating ISM components and instances based on directions.
Put actor into unused pool.
Client also handles all pathfinding & movement
- I do have future plans to send periodic position updates to ensure clients stay completely in sync (This is the next big problem to sort through. it doesnt have to be perfect. I will likely space out updates and break large updates up and send over time)

Direction data packages are incredibly lightweight. As of right now, there is very little bandwidth being used.

Utilizing Unreals built-in actor replication process to reliably send instance creation directions has worked really well. I tried to build a system to “replicate” instance creation in the same fashion UE replicates actors, but it was lagging everything to high hell.

This has been working very well. This is why I need the movement logic to be absolutely frame rate independent

I researched determinstic lockstep obsessively to see if that’s the direction I wanted to take, but I dont think its necessary for my project. My project is incredibly predictable already and the only thing i need to ensure is completely in sync is unit positions. If that happens the the rest of the logic can be client side other than damage/unit death (all the noncheating stuff)

3dRaven · July 15, 2024, 4:38pm

Is the ism component set to replicate? Perhaps it’s periodically pushing the servers ism structure up to the client causing the stalls?

LotionMyHotBod · July 15, 2024, 5:33pm

No, I have all created ISM components set to not replicate and I’ve verified that when the instance management blueprint replicates it does not override the client created component or its data. I’m pretty confident that the reason these instance are sticking is because of my ticking movement logic. I just cant figure out where or a good way to debug it. Here’s a video of the issue maybe it will add more context. You can see at the beginning both the low fps client (left) and high fps client (right) reach the goal at the same time, but towards end of video you can see stopped units. In this example not a single unit in the low FPS client stopped.

The screen recording makes it seem like there is movement jitter on high fps client as well, but there isn’t on my end.

3dRaven · July 15, 2024, 5:41pm

try running Stat UnitGraph on the client to see where the bottleneck is.
Is it

Frame time
Game logic
RHIT (render hardware interface thread)

Something is defiantly hammering the frame-rate there. The frame time seems very high. Are both client and server using the same logic to update the ism component?

Are you sure the client doesn’t use an older version of the update logic transforming each instance in the loop (the old way)

LotionMyHotBod · July 15, 2024, 6:01pm

The ‘Draw’ is what’s pushing the frame time up. Both client and server are using the exact same logic to create components, instances and movement.

Maybe I can help more if you can help me understand your line of thinking. Why do you believe this is specifically a performance issue rather than a logic flaw? Then I can do some exploration myself without wasting your time.

3dRaven · July 15, 2024, 6:04pm

So is the left client a lower end computer? Perhaps it just doesn’t have the “horsepower” to update the meshes in time? Draw would indicate that the GPU is struggling with keeping up with the demand.

Does it have lumen, vsm and distance fields turned off?

LotionMyHotBod · July 15, 2024, 6:09pm

So I have the game running in Standalone with 2 clients. The left client is a listen server and client. Right is the other client. I’m not sure why it works this way, but whichever window I focus gets all of the processing power of my computer and the others FPS tanks. This works well for testing bad performance and good performance syncing though!

I intentionally have graphic settings jacked up to push the performance down as much as possible. I have lumen on with best anti aliasing. When I turn off all of the settings I don’t need my FPS flies up to 400FPS on focused window and 150 on other, which makes it harder for me to stress test.

I should note my computer is decently powerful - i have a 3070 in it