World Partition loading leading to frame drops

When update the world partition, there are serious performance hits :

[Image Removed]

We tried playing with these variables :

static FAutoConsoleVariableRef CVarLevelStreamingComponentsRegistrationGranularity(

TEXT(“s.LevelStreamingComponentsRegistrationGranularity”),

GLevelStreamingComponentsRegistrationGranularity,

TEXT(“Batching granularity used to register actor components during level streaming.”),

ECVF_Default

);

static FAutoConsoleVariableRef CVarLevelStreamingAddPrimitiveGranularity(

TEXT(“s.LevelStreamingAddPrimitiveGranularity”),

GLevelStreamingAddPrimitiveGranularity,

TEXT(“Batching granularity used to add primitives to scene in parallel when registering actor components during level streaming.”),

ECVF_Default

);

Unfortunately, at first sight, there was no difference.

Can we truly throttle this over multiple frames ?

How can we better set our data up for minimizing these frame drops ? If the instance count is culprit, what is the ballpark for a good behavior ?

Thanks,

Basile

[Attachment Removed]

Steps to Reproduce
Hi,

This is a support follow up for :

[Content removed]

Trying to break the questions into individual tickets…

Thanks,

[Attachment Removed]

Hello!

Version 5.6 added the UInstancedStaticMeshComponent::CachedBounds property that will cache the bounds when they are calculated the first time. That property is serialized and will be part of the cooked data if it’s calculated prior to the cook operation. Based on the name of the ISMC, it appears that they have been generated by your editor process. I recommend calling UpdateBounds() on all the PrimitiveComponents that your process is creating (UStaticMeshComponent, UInstancedStaticMeshComponent, UHierarchicalInstancedStaticMeshComponent…)

If you are using World Partition, I recommend that you have a look at FastGeo. This plugin transforms instances of immutable meshes into lighter representation for the runtime. FastGeo and some new async injection features are discussed in the World Build Guide: https://dev.epicgames.com/community/learning/knowledge\-base/r6wl/unreal\-engine\-world\-building\-guide\#wp\-importantchangesin56

Regards,

Martin

[Attachment Removed]

Hi Martin,

Thanks for this, we will apply the change, I suppose it will be helpful. We are already aware of the guide you mentioned and are trying to use the fast geo plugin. We had a couple exchanges as some features are getting moved at cooking time when the cell transformer is applied…

I read the guide again and will give a try to this one :

LevelStreaming.AsyncRegisterLevelContext.Enabled

When opening the ticket, I was hoping for guidelines on the time budget and granularity settings. After instrumenting and reading the code, the granularity are used as limits before testing the time threshold. So a large time will still allow more registration that the granularity.

Another point which seems an issue to me is that the root component is *not* accounted for in the granularity meaning that in our case, if the root component is leading to a large time, then another component will still be registered even if time budget is already consumed…

Thanks

[Attachment Removed]

Hi,

I talked to a colleague who pointed me toward UInstancedStaticMeshComponent::PreSave. The bounds should be calculated and cached if the component is registered. That might be the problem here. You should make sure that the components are registered after generating the levels before saving them. The cost of ISMC registration is almost null when the bounds are already cached so this will take care of the problem.

As far as the granularity of operation and the timeout, validating the timeout can be expensive compared to the registration of most components. This is why we are enforcing batches to reduce the overhead of reading the clock.

Martin

[Attachment Removed]

Hi Martin,

It seems indeed that we able to get this one resolved.

What would be the right way to resolve this one :

[Image Removed]

It is especially frustrating that despite being called Async, the method gets calls from the game thread (while the async loader is idle !)

Thanks,

Basile

[Attachment Removed]

Bonjour Basile,

BuildTreeAsync schedules async work but there are some preparation works. There are 2 possible reasons why it’s taking that long. There might be too many instances in the HISMC and GetInstanceTransforms is what is taking all the time. The other possibility is that the game thread gets preempted by the OS. You can add the contextswitch channel and run the process as Administrator to see the state of the threads.

You should also try to understand why the tree requires rebuilding as it is being rebuilt when saving in the Serialize method. It should be up to date when loading it. Maybe there is something that invalidates it.

Martin

[Attachment Removed]

Hi,

When I checked, it feels like the trigger for rebuilding comes from :

[Image Removed]This is true : || PrimitiveInstanceDataManager.HasAnyChanges()

The instance data manager is still in initial state since the component creation…

It gets updated later in the process when the component is added.

> TigerIG.exe!FPrimitiveInstanceDataManager::FlushChanges(FInstanceUpdateComponentDesc && ComponentData={…}, bool bNewPrimitiveProxy) Ligne 704 C++ Les symboles ont été chargés.

TigerIG.exe!UInstancedStaticMeshComponent::CreateSceneProxy() Ligne 2418 C++ Les symboles ont été chargés.

TigerIG.exe!FActorPrimitiveComponentInterface::CreateSceneProxy() Ligne 5520 C++ Les symboles ont été chargés.

TigerIG.exe!FScene::BatchAddPrimitivesInternal<UPrimitiveComponent>(TArrayView<UPrimitiveComponent *,int> InPrimitives) Ligne 1376 C++ Les symboles ont été chargés.

TigerIG.exe!FScene::AddPrimitive(UPrimitiveComponent * Primitive) Ligne 1279 C++ Les symboles ont été chargés.

TigerIG.exe!FRegisterComponentContext::Process::__l10::<lambda>(int Index) Ligne 272 C++ Les symboles ont été chargés.

[Cadre en ligne] TigerIG.exe!UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::FFunctionRefStoragePolicy,void __cdecl(int)>::operator()(int <Params_0>=0) Ligne 471 C++ Les symboles ont été chargés.

[Cadre en ligne] TigerIG.exe!ParallelForImpl::CallBody(const TFunctionRef<void __cdecl(int)> &) Ligne 81 C++ Les symboles ont été chargés.

TigerIG.exe!`ParallelForImpl::ParallelForInternal<TFunctionRef<void __cdecl(int)>,`ParallelFor’::`2’::void <lambda>(const TArray<FString,TSizedDefaultAllocator<32>> &, UWorld *, FOutputDevice &),std::nullptr_t>‘::`2’::FParallelExecutor::operator()(const bool bIsMaster=true) Ligne 358 C++ Les symboles ont été chargés.

TigerIG.exe!ParallelForImpl::ParallelForInternal<TFunctionRef<void __cdecl(int)>,`ParallelFor’::`2’::void <lambda>(const TArray<FString,TSizedDefaultAllocator<32>> &, UWorld *, FOutputDevice &),std::nullptr_t>(const wchar_t * DebugName=0x00007ff7b284fd90, int Num, int MinBatchSize, TFunctionRef<void __cdecl(int)> Body={…}, ParallelFor::__l2::void <lambda>(const TArray<FString,TSizedDefaultAllocator<32>> &, UWorld *, FOutputDevice &) CurrentThreadWorkToDoBeforeHelping=void <lambda>(const TArray<FString,TSizedDefaultAllocator<32>> & Args, UWorld * InWorld, FOutputDevice & Ar){…}, EParallelForFlags Flags=None, const TArrayView<std::nullptr_t,int> & Contexts={…}) Ligne 440 C++ Les symboles ont été chargés.

[Cadre en ligne] TigerIG.exe!ParallelFor(int) Ligne 483 C++ Les symboles ont été chargés.

TigerIG.exe!FRegisterComponentContext::Process() Ligne 275 C++ Les symboles ont été chargés.

TigerIG.exe!FRegisterComponentContext::OnIncrementalRegisterComponentsDone() Ligne 224 C++ Les symboles ont été chargés.

TigerIG.exe!ULevel::IncrementalRegisterComponents(FRegisterComponentContext & Context={…}) Ligne 2020 C++ Les symboles ont été chargés.

TigerIG.exe!ULevel::IncrementalUpdateComponents(int NumComponentsToUpdate=8, bool bRerunConstructionScripts, FRegisterComponentContext * InContext=0x00000004e977a160) Ligne 1865 C++ Les symboles ont été chargés.

TigerIG.exe!UWorld::AddToWorld(ULevel * Level=0x00000186bdaadc90, const UE::Math::TTransform<double> & LevelTransform={…}, bool bConsiderTimeLimit, const TOptional<UE::FTimeout const> & ExternalTimeout={…}, FNetLevelVisibilityTransactionId TransactionId={…}, ULevelStreaming * InOwningLevelStreaming=0x0000018719133b20) Ligne 3688 C++ Les symboles ont été chargés.

TigerIG.exe!ULevelStreaming::UpdateStreamingState(bool & bOutUpdateAgain=false, bool & bOutRedetermineTarget=false, const TOptional<UE::FTimeout const> & InExternalTimeout={…}) Ligne 1056 C++ Les symboles ont été chargés.

[Cadre en ligne] TigerIG.exe!FStreamingLevelPrivateAccessor::UpdateStreamingState(ULevelStreaming *) Ligne 804 C++ Les symboles ont été chargés.

TigerIG.exe!UWorld::UpdateLevelStreaming(const TOptional<UE::FTimeout const> & ExternalTimeout={…}) Ligne 4903 C++ Les symboles ont été chargés.

Would you be able to tell what we are missing ?

Basile

[Attachment Removed]

Turns out that rebuilding the tree is unavoidable. How many instances are there in that HISMC? I ran a test with 1300 instances and BuildAsync takes 0.25ms.

I was thinking that you might not need to use HISMC as you are building a flight sim. The benefit of the HISMC vs the ISMC is that the clusters can be culled and occluded while the ISMC will render all instance all the time. That is actually only true for non-Nanite meshes. Nanite meshes are culled and occluded on the GPU so you should not use them with an HISMC as it becomes useless overhead on the CPU. If the meshes are not using nanite, I would recommend adding less instances per components. You could also use ISMC if those are used for forest or other type of dense areas that are only seen from the cockpit.

[Attachment Removed]

Hi Martin,

I start investigating the content itself and I identified some components with more than 200000 instances which can explain the long time in the method. We are working the content at the moment so we have smaller tiles (hence less instances per components).

As far as we could see, we are currently GPU limited and HISM are providing better results than ISM… Also in all our attempts with nanite, the performances were degraded once enabled so we are working the old way with impostors.

We are also considering to break down components in quad tree structure to constrain the instance count but the world building guide recommend merging / reducing actor count so we are unsure of what it best…

Thanks,

Basile

[Attachment Removed]

I think that the high cost for BuildTreeAsync is actually coming from GetInstanceTransforms which gathers the transforms. I would be curious to see if a ParallelFor could help here but that would require working from the sources. It might still not be fast enough as it would needs lots of cores to split the 160ms of work down to an acceptable value.

The main difference between the HISM and ISM is that the HISM are clustered. As you are generating the instances from code, you could probably generate ISMs on a grid smaller than a cell. This would essentially mimic what the HISM is doing internally at the “cost” of a few extra components.

FastGeo could also be interesting here as could transform those ISMs and move lots of the processing off the game thread.

[Attachment Removed]