CreateExternalAccelerationStructure - Timeslice possible?

Hello,

We’ve been battling some CPU spikes when streaming in and out level streaming levels in a very large scale map with World Partition. One of these spikes is CreateExternalAccelerationStructure. I’ve already found a few UDN posts about these:

[Content removed]

[Content removed]

[Content removed]

But none of them really helped us solve our spike problem. We’re already using World Partition, but because it’s a large world, and we have many collision objects, the cost remains high. After multiple levels have streamed in, we get hit with 10-70ms spikes, depending on what part of the level we’re travelling to, of CreateExternalAccelerationStructure. The cost got slightly better (seemingly) when we enabled PreIncrementalRegister/UnregisterComponents/asyncchaosinit cvars as the registration and unregistration was probably more spread out, but the cost remains high.

Adding additional trace data, the bulk of the cost seems to come from FPBDRigidsEvolutionBase::FlushExternalAccelerationQueue, where we process pending spatial data.

From a quick glance I can’t tell whether this function would support time slicing or not, if it would support it, and we can flush over multiple frames, it would probably allow us to greatly improve the cost. However, without an in-depth knowledge of the queue I don’t feel comfortable doing this change.

Is time slicing fundamentally impossible during the flush? Or would it be possible to add support for it, and if so, what’s the best way to do it that you can recommend? And if there’s no possibility of adding time slicing, is there any other recommendation you can give that might improve this?

Any assistance/recommendations you can give on this problem would greatly help us, and I appreciate your time looking into this.

Kind regards,

Céleste Neukirchen

[Attachment Removed]

Steps to Reproduce
Not sure, probably streaming in levels with many collision objects

[Attachment Removed]

Hello,

I did some further investigation to determine what exactly inside flush is causing the high cost, and I noticed two things. I also determined that, the way the function works, time slicing the pending actions wouldn’t be possible, but perhaps time slicing the removal of old pending data might?

One; There is a lot of pending data that is marked as an “Add operation”, when that element is already in the acceleration tree at the exact location that the pending data is computing. This leads me to believe that there is a high amount of duplicate work being done and there’s more pending data then there should be.

Two; The bulk of the cost comes from removing the pending data, looking at the function where we remove an entry from the ExternalQueue, it also makes sense. The function removes at swap FPendingSpatialData whenever the pending data is no longer needed, however when we have a level as large as ours, with as many colliding objects as us, doing this 10k to 100k times in a single function call is very slow.

I’ve added additional logging to get exact numbers, and the logs show the following attached at the bottom.

Data format:

NumPending [NumPendingEntries] Num[Added / Updated / Deleted / Removed], NumBad[Adds / Updates / Deletes], MS[AddTime / UpdateTime / DeleteTime / RemoveTime]

Delete is Delete Operations, remove is removing data from the pending data.

NumBad is where the audits have been commented out (1 bad add = Adding when already in tree)

When inspecting this data I can find that the majority of the time is spent in Add and Remove, where the majority of the add time is accounted for by the duplicate adds. Except for one line (line 13) where the Add time is expensive, all expensive Add times have a high proportion of duplicate adds, leading me to believe it has more pending data than it should.

Secondly, the big bulk of all the time comes from removing pending data, and as I wrote above, the way it’s implemented is very slow for 10-100k entries being removed at once.

Is Epic Games aware of these two performance issues? And if so, is there a timeline for when these could be addressed?

FlushExternalAccelerationQueue, NumPending [24] Num[23 / 0 / 1 / 0], NumBad[23 / 0 / 1], MS[0.0202 / 0.0000/ 0.0008 / 0.0000]
FlushExternalAccelerationQueue, NumPending [439257] Num[354191 / 545 / 84521 / 0], NumBad[303698 / 545 / 3596], MS[265.5232 / 4.2787/ 20.6948 / 0.0000]
FlushExternalAccelerationQueue, NumPending [440325] Num[51225 / 860 / 84521 / 303719], NumBad[49606 / 315 / 84521], MS[26.5078 / 2.0931/ 8.7666 / 4074.1984]
FlushExternalAccelerationQueue, NumPending [136775] Num[42 / 137 / 0 / 136596], NumBad[2 / 2 / 0], MS[0.0550 / 0.2991/ 0.0000 / 439.9897]
FlushExternalAccelerationQueue, NumPending [8407] Num[21 / 40 / 0 / 8346], NumBad[1 / 0 / 0], MS[0.0631 / 0.0484/ 0.0000 / 5.3420]
FlushExternalAccelerationQueue, NumPending [11094] Num[3 / 94 / 1 / 10996], NumBad[2 / 0 / 1], MS[0.0188 / 0.1498/ 0.0006 / 5.0516]
FlushExternalAccelerationQueue, NumPending [4771] Num[225 / 41 / 6 / 4499], NumBad[225 / 0 / 6], MS[0.3314 / 0.0713/ 0.0032 / 1.3826]
FlushExternalAccelerationQueue, NumPending [4987] Num[624 / 29 / 0 / 4334], NumBad[624 / 0 / 0], MS[1.1164 / 0.0518/ 0.0000 / 2.2469]
FlushExternalAccelerationQueue, NumPending [16990] Num[310 / 29 / 0 / 16651], NumBad[310 / 0 / 0], MS[0.2721 / 0.0574/ 0.0000 / 12.8272]
FlushExternalAccelerationQueue, NumPending [10326] Num[0 / 29 / 310 / 9987], NumBad[0 / 0 / 310], MS[0.0000 / 0.0682/ 0.0185 / 4.4128]
FlushExternalAccelerationQueue, NumPending [13265] Num[0 / 34 / 354 / 12877], NumBad[0 / 0 / 117], MS[0.0000 / 0.0535/ 0.0953 / 6.9863]
FlushExternalAccelerationQueue, NumPending [7586] Num[2555 / 56 / 0 / 4975], NumBad[2152 / 11 / 0], MS[2.5343 / 0.1477/ 0.0000 / 1.9040]
FlushExternalAccelerationQueue, NumPending [39801] Num[12575 / 36 / 0 / 27190], NumBad[3920 / 6 / 0], MS[32.4842 / 0.0907/ 0.0000 / 61.2164]
FlushExternalAccelerationQueue, NumPending [13329] Num[595 / 29 / 0 / 12705], NumBad[91 / 0 / 0], MS[0.7244 / 0.0657/ 0.0000 / 9.7775]
FlushExternalAccelerationQueue, NumPending [19203] Num[1552 / 29 / 0 / 17622], NumBad[1532 / 0 / 0], MS[1.5429 / 0.0723/ 0.0000 / 23.9930]
FlushExternalAccelerationQueue, NumPending [40268] Num[2146 / 29 / 0 / 38093], NumBad[2132 / 0 / 0], MS[2.3334 / 0.0740/ 0.0000 / 76.6345]
FlushExternalAccelerationQueue, NumPending [6919] Num[315 / 29 / 0 / 6575], NumBad[315 / 0 / 0], MS[0.3192 / 0.0720/ 0.0000 / 3.5380]

[Attachment Removed]

We hit something similar and improved this by deferring the resizing of the array.

In PendingSpatialData.h, change the Remove fuction in FPendingSpatialDataQueue to not allow shrinking by changing the line with the RemoveAtSwap call to PendingData.RemoveAtSwap(SlotIdx, EAllowShrinking::No);

Then in FPBDRigidsEvolutionBase::FlushExternalAccelerationQueue shrink the buffer there by calling ExternalQueue.PendingData.SetNum(ExternalQueue.Num(), EAllowShrinking::Yes); after all the pending data has been processed, see end of the function below

void FPBDRigidsEvolutionBase::FlushExternalAccelerationQueue(FAccelerationStructure& Acceleration, FPendingSpatialDataQueue& ExternalQueue)
	{
		//update structure with any pending operations. Note that we must keep those operations around in case next structure still hasn't consumed them (async mode)
		const int32 SyncTimestamp = Acceleration.GetSyncTimestamp();
		for (int32 Idx = ExternalQueue.PendingData.Num() - 1; Idx >=0; --Idx)
		{
			const FPendingSpatialData& SpatialData = ExternalQueue.PendingData[Idx];
			if(SpatialData.SyncTimestamp >= SyncTimestamp)
			{
				//operation still pending so update structure
				//note: do we care about roll over? if game ticks at 60fps we'd get 385+ days
				if(SpatialData.Operation == EPendingSpatialDataOperation::Delete)
				{
					const bool bExisted = Acceleration.RemoveElementFrom(SpatialData.AccelerationHandle,SpatialData.SpatialIdx);
					//ensure(bExisted);
				}
				else
				{
					FGeometryParticle* UpdateParticle = SpatialData.AccelerationHandle.GetExternalGeometryParticle_ExternalThread();
					FAABB3 WorldBounds;
					const bool bHasBounds = UpdateParticle->GetGeometry() && UpdateParticle->GetGeometry()->HasBoundingBox();
					if(bHasBounds)
					{
						TRigidTransform<FReal,3> WorldTM(UpdateParticle->X(),UpdateParticle->R());
						WorldBounds = UpdateParticle->GetGeometry()->BoundingBox().TransformedAABB(WorldTM);
					}
					const bool bExisted = Acceleration.UpdateElementIn(UpdateParticle,WorldBounds,bHasBounds,SpatialData.SpatialIdx);
					if (SpatialData.Operation == EPendingSpatialDataOperation::Add)
					{
						//ensure(bExisted == false); // TODO use this for ADD/UPDATE Audit
					}
					else
					{
						//ensure(bExisted == true); // TODO use this for ADD/UPDATE Audit						
					}
				}
			}
			else
			{
				//operation was already considered by sim, so remove it
				//going in reverse order so PendingData will stay valid
				ExternalQueue.Remove(SpatialData.UniqueIdx());
			}
		}
 
		// Shrink the Pending Data array in case elements have been removed
		ExternalQueue.PendingData.SetNum(ExternalQueue.Num(), EAllowShrinking::Yes);
	}

[Attachment Removed]

Hi Celeste,

The duplicated operations are needed in FlushExternalAccelerationQueue since we have a buffering system, and what may be unneeded for one SpatialAcceleration may be needed for another. In the cases where there is duplication though, it is a pretty quick check to remove it.

The deferred deletion that Jalpesh mentions though is a good idea and we’ll look to get that implemented.

Have you looked into Lightweight actors? TLDR is that they allow you to only spawn them (and the physics collision) when there is a gameplay need - so geometry which isn’t ever queried doesn’t get loaded.

Light Weight Instances UE5 | Community tutorial - here is a decent article outlining the process.

There is also the option of FastGeo which is available as a plugin (although this is still very much experimental) - but is great for static geometry in level loading.

All the best

Geoff Stacey

Developer Relations

EPIC Games

[Attachment Removed]

Hey,

I can confirm that delaying reallocation until after we processed it all has meaningful improvements on the removal part. We still see the overhead of doing duplicate Add work, but I will wait for Epic Games to answer there and what, if any, plans they have regarding fixing that.

FlushExternalAccelerationQueue, NumPending [439833] Num[354765 / 601 / 84467 / 0], NumBad[303676 / 601 / 3596], MS[269.2852 / 3.1613/ 20.8025 / 0.0000]
FlushExternalAccelerationQueue, NumPending [440720] Num[51601 / 955 / 84467 / 303697], NumBad[49602 / 354 / 84467], MS[25.9463 / 2.1361/ 2.0473 / 7.0054]
FlushExternalAccelerationQueue, NumPending [137264] Num[0 / 13 / 0 / 137251], NumBad[0 / 0 / 0], MS[0.0000 / 0.0291/ 0.0000 / 3.5794]
FlushExternalAccelerationQueue, NumPending [8326] Num[624 / 20 / 0 / 7682], NumBad[370 / 0 / 0], MS[1.2630 / 0.0264/ 0.0000 / 0.3273]
FlushExternalAccelerationQueue, NumPending [10638] Num[1682 / 32 / 0 / 8924], NumBad[1682 / 3 / 0], MS[1.4373 / 0.0866/ 0.0000 / 0.6543]
FlushExternalAccelerationQueue, NumPending [34594] Num[11394 / 29 / 0 / 23171], NumBad[3198 / 0 / 0], MS[27.5230 / 0.0755/ 0.0000 / 0.9355]
FlushExternalAccelerationQueue, NumPending [15664] Num[2938 / 29 / 0 / 12697], NumBad[2600 / 0 / 0], MS[3.3247 / 0.0447/ 0.0000 / 0.3024]
FlushExternalAccelerationQueue, NumPending [15753] Num[141 / 29 / 0 / 15583], NumBad[49 / 0 / 0], MS[0.3046 / 0.0498/ 0.0000 / 0.7043]
FlushExternalAccelerationQueue, NumPending [37263] Num[6856 / 29 / 0 / 30378], NumBad[3896 / 0 / 0], MS[10.9904 / 0.1188/ 0.0000 / 0.8535]

[Attachment Removed]