Garbage collection slow on the server with world partitioning

Hi,

We are finding that the GC is slow on the server, stalling for around 6 frames each minute.

We use the world partitioning system and have a large open world with 50 players. The server runs at 60fps (except for when the GC runs) and the GC runs roughly every minute and takes around 90ms to 100ms to complete. We are currently using the default GC settings.

I think we are going to try asset clustering first, to see if that will improve performance. Although at the moment it will be more of a case of trial and error to see what helps.

Any advice on how to make the GC runs faster?

Thanks,

Phil.

Steps to Reproduce

Our server is multithreaded so I guess the following isn’t going to work?

https://dev.epicgames.com/documentation/en\-us/unreal\-engine/incremental\-garbage\-collection\-in\-unreal\-engine?application\_version\=5\.5

Hey there, [mention removed]​ 's Great Hitch Hunt presentation, which covers general causes of hitches and how to address them, has a section on Garbage Collection hitches. Here’s a timestamped link to the presentation.

I recommend running the obj list -countsort command on the server at any moment in time to log how many UObjects per class you have. The object classes having the highest count are the ones you should first consider optimizing. We can brainstorm along if you’re willing to share your obj list -countsort output.

We are only using Incremental GC currently on Fortnite Battle Royale dedicated servers which run single threaded, so I would not recommend using that currently (in neither 5.5 or 5.6).

Next up we are going to try getting gc.AllowParallelGC to work.

Seems like the FPlatformMisc::NumberOfCores function on LInux is returning 1 on our server which has 2 cores, so the GC isn’t running in parallel.

I’m not sure if that is a feature or a bug, but our log shows the following:

[2025.09.16-11.12.35:810][ 0]LogInit: - Number of physical cores available for the process: 1

[2025.09.16-11.12.35:810][ 0]LogInit: - Number of logical cores available for the process: 2

So there are 2 logical cores on the server, but GC doesn’t see that. Looks like we’ll have to modify the engine code in ShouldForceSingleThreadedGC().

Thanks for sharing the object lists. It seems the highest UObject counts are all actor components of certain types (72k+20k SplineMeshComp, 45k decals, 45k HISMC, 17k SMC). It will be worthwhile to:

  • Use less components by pooling and reusing them, depending on how many of these have short life spans.
  • Disable components on the server if they’re not necessary server-side:
    • Are the 45k decal comps needed for the server?
    • The 92k spline mesh comps needed on the server?
    • Are the 45k instanced static meshes interactable / collision affecting or purely cosmetic foliage?
    • ^ I think there is a lot of gain to be made by despawning things on the server.
    • Overriding UObject::NeedsLoadForServer() for your custom classes (or as an engine modification) could be a low hanging fruit here
    • Replace many of these StaticMeshComponents (up to 16000) by Instanced SMCs: manually, or do it automated at cook-time like using WorldPartitionRuntimeCellTransformerISM
  • Merge many of the HierarchicalInstancedStaticMeshComponents if they use the same mesh.

If you want to predict potential gains first, try making a server build where you disable/destroy the components of which there are 20k+ and see how much time GC takes without those.

I’m curious to hear what steps you’ll take and what the perf impact will be.

Sounds good!

Hi again, have you seen this section of the Unreal Fest Orlando livestream about the Witcher 4 demo and streaming improvements? Here is the timestamped link.

The FastGeo plugin, although it’s experimental, was very helpful in UStaticMeshComponent and UInstancedStaticMeshComponent counts, by replacing many static meshes at cook-time with primitives that don’t involve components at run-time. Given your high ISMC and SMC counts (45000 and 16000), that could have significant impact on your GC execution times.

FastGeo is still experimental and not compatible with UE 5.5 or 5.6, because rendering, Chaos and world building systems have been modified to provide hooks for that plugin, but at the least I wanted to put the plugin and that livestream presentation on your radar if you haven’t seen it yet. Reducing object counts was a big focus for the Witcher 4 demo, including to improve GC performance.

An update on where we are with speeding up the GC on the server.

We have enabled asset clustering, actor clustering and forced the GC to be parallel on the server. Each of these changes were tested and shown to have an improvement to the GC time.

We have stripped out some objects which were easy to remove such as: decals, textures, materials, widgets, inputs, audio, and instanced static meshes without collisions.

We tried the WorldPartitionRuntimeCellTransformerISM, but all attempts failed. We found floating meshes, missing meshes and meshes with missing materials. When we profiled the game, even though the object count looked to be smaller, the GC time was the same.

Currently we have got the GC time to 30ms.

We are now trying FastGeo streaming, which so far is looking to give good numbers, reducing the object count from around 250k to 125k.

Probably my last update.

After some strange measurements, it turns out that it’s really important to do a clean cook after making any changes with the NeedsLoadForServer functions. So some of the readings we posted before were incorrect.

FastGeo Streaming for us gave similar results to WorldPartitionRuntimeCellTransformerISM, but with fewer floating objects. Maybe we can fix the floating objects, but unfortunately FastGeo Streaming increased our GC time.

In summary, FastGeo Streaming gave us:

  • Reduced object count by roughly 24.5K (197725 to 173042)
  • Reduced the memory by roughly 12M (487.413M to 475.412M)
  • Floating objects
  • Increased the server GC time by 0.3ms (18.74ms to 19.07ms)

It’s possible that FastGeo Streaming would perform better if the GC asset and/or actor clustering was disabled, but we haven’t tried that due to the floating objects issues.

Attached is our current object list.

Hey Phil, I’ve just read up on your recent posts after returning from Unreal Fest. I see you’ve cut a large number of objects from the dedicated server so that’s great. I’ll respond to all your points one by one, apologies for the wait.

NeedsLoadForServer behavior

You mentioned: “So, when looking into the UObject::NeedsLoadForServer, it looked like the engine code was already written with the mindset of removing primitives without collisions on the server. For some reason though the feature is mostly disabled, I don’t know if that’s a bug.”

I wasn’t aware, but that it’s disabled is in line with other default behavior, where we don’t want the engine to make assumptions for you. That’s just my guess though, if you want to know the answer with certainty let me know and I can ask around internally what the intention was.

FastGeo results

When you say that with FastGeo you reduced object count by roughly 24.5K, that’s compared to WorldPartitionRuntimeCellTransformerISMs, correct? Or is that compared to not transforming the level contents?

As for the resulting GC times (18.74ms), how did you measure that exactly? I.e. which Unreal Insights timer, log message etc. If you’ll provide that, I can ask internally if anyone has ideas for how FastGeo could make GC times worse. But we’ll need some more granular info like how much of those 19ms is spent in PerformReachabilityAnalysis, GatherUnreachableObjects, and so on. A screenshots of an insights capture would be helpful here.

Incorrect results

Ending up with floating objects is wrong in the case of both WP runtime cell transformer and FastGeoStreaming. If you can repro that in a clean UE 5.7 Preview project, our dev team would appreciate that and would investigate it. Or, if you notice any factors that can trigger the floating objects (i.e. are they in embedded level instances, or child actor components, etc) that would be helpful too.

Is that 18-19ms server GC time acceptable or will you want to decrease that more? It might be worthwhile playing with different periods (30s) to see if more frequent GC results in faster passes. Or do them less frequent as long as the server is below a certain memory threshold.

BodySetup numbers

You asked before about whether high UBodySetup counts are to be expected. It depends on a number of factors:

  • the number of unique static mesh and skeletal mesh assets you have.
  • every skeletal mesh component with per-poly collision constructs a UBodySetup
  • every spline mesh component constructs a UBodySetup

This list isn’t exhaustive but I just had a look at what constructs UBodySetups. Since you have many spline mesh components, that number of body setups is expected. If you want to dig into it, you can execute this snippet at any time to log out the source of body setups:

	ForEachObjectOfClass(UBodySetup::StaticClass(), [this](UObject* BodySetup)
		{
			UE_LOG(LogTemp, Warning, TEXT("FOUND BodySetup: '%s' in package '%s'"), *BodySetup->GetPathName(), *BodySetup->GetPackage()->GetName());
		});

I suspect it’s the runtime created body setups from SplineMeshComponents though. If those spline mesh components need any collision / overlap / raycastability then they’ll need a body setup.

Thanks for the new data. I’ll gather some thoughts from team members and will reply later this week.

Having recently tested the GC clustering on a test scene, I found that it only moves the time cost from reference-tracking to clearing the GC flag for clusters. Hardly a few percentage points of GC time saved in my case. Going through the object list like Zhi Kang recommended and removing objects you don’t need is where you’ll find most savings.

The Hitch Hunt talk is also available in text format, here’s the link to the GC part in writing.

We have found the asset clustering has taken the GC time down from around 92ms to 78ms.

I’m just checking whether we can share the full list, but I can tell you what is top of the list and that was the BodySetup.

Is BodySetup usually the top of the object list?

Here is our full object list.

This looks to have worked dropping the GC spike from 78ms to 54ms.

Unfortunately we have also done an engine upgrade to UE5.6.1 so these results might be skewed, but we can definitely see the GC running on 2 threads now whereas before we only saw the GC on 1 thread.

Thanks for the feedback, I was look into stripping out some of these objects on the server. I’ll start with the decal comps and go from there.

I’ll keep you posted on what we find.

Thanks for the tip about the decals and the UObject::NeedsLoadForServer function. Our world is mostly static though so there isn’t a lot to be gained by us by using pooling.

Turning the decals off was straightforward using ClassesExcludedOnDedicatedServer in DefaultEngine.ini. Unfortunately, most of the spline meshes have collisions, turns out they are used as the floor. We did manage to remove every primitive on the server without collisions though.

So, when looking into the UObject::NeedsLoadForServer, it looked like the engine code was already written with the mindset of removing primitives without collisions on the server. For some reason though the feature is mostly disabled, I don’t know if that’s a bug. We found a variable called UPrimitiveComponent::AlwaysLoadOnServer which was initialised to true in the UPrimitiveComponent’s constructor. So, we have changed that tiny bit of engine code, a one-line change, and now all primitives without collisions are automatically stripped from the server.

We have got the GC time down from 54ms to 47ms, I’ve attached the new object list.

The UWorldPartitionRuntimeCellTransformerISM looks interesting, are there any docs on how to use this?

Thanks for the feedback.

The FastGeo reduced object count was comparing against having no world partition runtime cell transformer ISMs.

For measuring the GC time, the method used was to take a trace capture for 5 minutes giving us 5 GC samples. Then within the Insights tool we would selected all and search by the STAT_GCMarkTime timer. Within each trace capture the majority of the GC time is spend within PerformReachabilityAnalysis. I’ve attached 3 images to show we we see.

If we get time, we’ll investigate FastGeo further, unfortunately since there were no apparent gains with FastGeo there isn’t much incentive to do that right now.

18/19ms GC time isn’t ideal, the server hitch has been reduced but we would like to see that number be reduced more. We would like to use incremental reachability analysis, so if that could be made thread safe then that would be the next option to try.