Knowledge Base: Garbage Collector Internals

Unreal Engine’s Garbage Collector is a standard Mark & Sweep collector. This post explains its stages and what it does under the hood.
Summary
These are the major stages of the Garbage Collector:

Mark unreachable.
Mar…

https://dev.epicgames.com/community/learning/knowledge-base/ePKR/unreal-engine-garbage-collector-internals

14 Likes

Are there any plans to implement incremental support for the marking phase? Would this be possible without major rewrite? – Note that Unity is apparently using an Open Source GC ( Boehm–Demers–Weiser) that does support incremental marking and sweep phases. Was this ever considered for UE?

I’m not aware of any such plans. (EDIT: We’re working on it! :+1:)
If the marking phases are having an impact on your game’s performance it might be a symptom of your game having too many UObjects alive. Usually bigger games don’t need more than a couple hundred thousand UObjects alive at any time.

If you’re spawning lots of UObjects with short lifetimes you can look into either re-using them, using FStructs instead of UObjects where you can, or putting them into a GC cluster to reduce garbage collection overhead.

For long-lived UObjects, there’s the option of loading in less / HLODding more.

2 Likes

We’re now working on Incremental Reachability Analysis for the Garbage Collector. It’s early on and experimental as of Unreal Engine 5.4 so I don’t recommend turning it on for your shipping projects yet. But if you want to experiment with it in a sandbox check out the documentation on how to turn it on.

2 Likes

How do you do that? i’ve been looking into info about how to set up a gc cluster but haven’t found it.

You need to first make sure the UObject class type allows being in a cluster:

virtual bool CanBeInCluster() const override { return true; }

Then, for the parent UObject that contains all the child UObjects you don’t want the garbage collector to scan, you can simply call this single function on it and it will automatically scan all subobjects recursively and add them if they CanBeInCluster:

ParentObject->CreateCluster();

If the automatic cluster creation is too slow for you, you can do it quicker by manually creating the cluster and specifically adding the subobjects:

// Run this on the parent object that contains all the sub-objects you want clustered.
FUObjectItem* RootItem = GUObjectArray.ObjectToObjectItem(this);
const int32 InternalIndex = GUObjectArray.ObjectToIndex(this);
const int32 ClusterIndex = GUObjectClusters.AllocateCluster(InternalIndex);
FUObjectCluster& Cluster = GUObjectClusters[ClusterIndex];

this->SetClusterIndex(ClusterIndex);
this->SetFlags(EInternalObjectFlags::ClusterRoot);

// Optional, if you know in advance how many sub-objects you're adding, including this parent.
// This saves the array from having to resize all the time if you're adding many objects.
Cluster.Objects.Reserve(TotalSubobjectCount);

Cluster.Objects.Add(this);
// And call Add for all the sub-objects
	
// ClusterTestObject->CreateCluster() calls Sort, I don't know if it's required, but it must do it for a reason.
Cluster.Objects.Sort();

I must admit that I just parroted this advice as I’ve heard years ago. As I was checking how to make GC clusters for you I wasn’t sure what the actual performance gain was so I made a test. The results aren’t as mindblowing as I was hoping, it’s maybe a 50% ms saving in a Shipping build for the clustered objects. The reason it’s not faster than that is mostly due to how long MarkClusteredObjectsAsReachable takes.

In Development and Test builds my cluster was actually not faster at all due to GC-verification. You can turn the verification off in Development and Test builds with -NoVerifyGC -DPCvars="gc.VerifyAssumptionsOnFullPurge=0" startup parameters. You can also use -LogCmds="logGarbage verbose" to get the timings of the GC in the log to compare.

This is what the results look like for me, with around 600k objects clustered and not much else happening in the scene:

Collecting garbage - No GC Cluster
0.019800 ms for MarkObjectsAsUnreachable Phase (34 Objects To Serialize)
17.532300 ms for Reachability Analysis
GC Reachability Analysis total time: 18.11 ms (18.11 ms on reference traversal)
18.11 ms for GC - 99367 refs/ms while processing 1799453 references from 598753 objects  with 0 clusters
Freed 65536b from 16 GC contexts
1.186699 ms for Gather Unreachable Objects (10 objects collected / 598762 scanned with 126 thread(s))
0.125203 ms for unhashing unreachable objects (10 objects unhashed)
GC purged 10 objects (598762 -> 598752) in 11.437ms (1 iteration(s))
Compacting FUObjectHashTables data took  21.28ms


Collecting garbage - GC Cluster
10.099500 ms for MarkObjectsAsUnreachable Phase (34 Objects To Serialize)
0.851799 ms for Reachability Analysis
GC Reachability Analysis total time: 11.47 ms (11.47 ms on reference traversal)
11.47 ms for GC - 508 refs/ms while processing 5840 references from 882 objects  with 1 clusters
0.002302 ms for Dissolve Unreachable Clusters (0/1 clusters dissolved containing 0 cluster objects)
Freed 65536b from 16 GC contexts
1.156200 ms for Gather Unreachable Objects (0 objects collected / 598762 scanned with 126 thread(s))
0.000097 ms for unhashing unreachable objects (0 objects unhashed)
GC purged 0 objects (598752 -> 598752) in 0.002ms (1 iteration(s))
Compacting FUObjectHashTables data took   7.52ms

Reachability analysis decreased from 17.3ms to 0.85ms.
But MarkObjectsAsUnreachable increased from 0.02ms to 10.1ms. So yeah, your mileage may vary and it might not be worth it. MAybe it’s better to just keep the UObject count low :man_shrugging:

1 Like

amazing, comprehensive, and useful! thanks so much Ari!! <3

“Do less to do more” i guess it’s always a good first optimization.

1 Like