Concerning Game thread being stalled by asynchronous loading thread

Hi,

Some context:

We are use a world tiling mechanism which ends up with level being stored as separate pak files from the main application. According to player world position, levels are loaded asynchronously in the world.

Some frames, while loading in new contents, the GT is stalling for a very long time.

If that matters, our tiles may contains HISM with numerous instances (like trees in a forest)

Exemple frame:

[Image Removed]

Extra zoom:

[Image Removed]

We can see there are only a few nanoseconds before the GT starts again:

[Image Removed]

How can we avoid this ?

Adding more details and findings…

Since this is a blocking matter for us at the moment, blocking our delivery, I went ahead into the code. I added more traces to confirm the diagnosis :

  • Convenient TRACE_CPUPROFILER_EVENT_SCOPE_STR (__FUNCTION__); so I can see in insight a more detailed report from the behavior.

  • A verbose acquire / release from the faulty critical section.

    class FUObjectHashTables

{

/** Critical section that guards against concurrent adds from multiple threads */

FTransactionallySafeCriticalSection CriticalSection;

int AcquireSectionDepth = 0;

public:

/** Hash sets */

TBucketMap<int32> Hash;

TMultiMap<int32, uint32> HashOuter;

FORCEINLINE void Lock()

{

TRACE\_CPUPROFILER\_EVENT\_SCOPE\_STR (\_\_FUNCTION\_\_);

CriticalSection.Lock();

if (0 \=\= AcquireSectionDepth)

{

  TRACE\_CPUPROFILER\_EVENT\_SCOPE\_STR (\_\_FUNCTION\_\_ ": ACQUIRE")

  int CurrentLength \= Hash.Num ();

}

\+\+AcquireSectionDepth;

}

FORCEINLINE void Unlock()

{

TRACE\_CPUPROFILER\_EVENT\_SCOPE\_STR (\_\_FUNCTION\_\_);

\-\-AcquireSectionDepth;

if (0 \=\= AcquireSectionDepth)

{

  TRACE\_CPUPROFILER\_EVENT\_SCOPE\_STR (\_\_FUNCTION\_\_ ": RELEASE")

  int CurrentLength \= Hash.Num ();

}

CriticalSection.Unlock();

}

The behavior is quite explicit :

At the frame level, the GT is blocking on the critical section acquisition

[Image Removed]

As far as I can see, within the section is acquired at the beginning of CreateExport method : [Image Removed]All subsequent lock are reentrant until the lock gets released at the end :

[Image Removed]

It seems like a defect to me that an asynchronous thread may hold a critical section used by the main thread (note we do not have any single Color Correct Regions in our applications !) for a duration in multi milliseconds order of magnitude (and obviously not few hendreds milliseconds…)

Please advise.

Also, it would be nice to understand what in our content is leading to this massive preload time. There is a chance we can refactor things on that front if that may help.

Thanks in advance,

Basile

Hey Basile,

for this specific case you are likely hitting an issue that is based on the architecture of our old loading path.

Could you try a build with enabled IOStore containers (e.g. all your content merged, disabling your custom pak file logic) and check your loading performance there?

I’ve been talking to the engineers since we’ve seen this case in two different support requests this week and I’ve been told only the old loading path exhibits this issue (e.g. loading loose assets from disk when starting the game from the binaries/ folder during development, or using pak files with IOStore disabled).

So for now it would be important to verify that you don’t see similar problems with zen loader (the IOStore loading path).

We are looking into options to improve this for the old loader, but our clear recommendation is to use IOStore going forward, so if a fix would require extensive changes it’s possible we will not implemented it.

If there are specific reasons that prevent you from using IOStore in your modular data approach it’s likely better to talk about how we can solve these for you compared to fixing our old loading path.

As for optimization, a lot of the recursive locking happens for deep blueprint hierarchies, or similar cases like PackedLevelActors.

As a workaround you can gather a list of assets (or ideally the one that triggers the most dependencies) that are loaded during this long blocking load and identify the root assets that pull in most of the dependencies. Once you know these you can preload just the dependencies individually through the StreamableManager and only then load the heavy assets. This should avoid holding the long lock since most of the dependencies of the original package will be already loaded.

In general it can be a good idea to reduce your assset dependencies as much as possible (e.g. by using blueprint interfaces instead of referencing or casting to specific blueprint classes.).

The reference viewer in the Editor should be a good way to get an idea of what your assets pull in once you’ve identified the main culprits.

Kind Regards,

Sebastian

Hi Sebastian,

I guess the other support request falling into this specific code path is also coming from us. We splitted investigations and while someone else was looking into garbage collection spikes, I was assigned to the loading ones…

We are in the process of trying to implement IOStore for testing. This is in progress and we hope to have some results next week.

However, on our hand, the hardest requirement is that the data must be packaged in a separate phase from the application packaging itself. For the background, we typically have one pak file per geocell (one latitude per one longitude) and need to cover very large areas with high resolutions.

As a consequence, in order to parallelize data conversion (from assets to pak), to have reasonable version management and decent hardware requirement for packaging, containers must to be discoverable at runtime.

While we investigate the IOStore usage for the data, can you detail the locking granularity for the faulty critical section ? I added some more instrumentation in the code and it appears the mutex is acquired at the beginning (until the end) of FLinker::GetArchetypeFromRequiredInfo :

Currently, we are loading levels containing blueprint containing HISM. I would like to understand if chopping HISM or other large objects into small assets could help us…

Thanks,

Hi Sebastian,

Our developper was not (yet?) able to load the utoc, ucas files content from an application which was not packaged / shipped at the same time as the application creation. He also tried to use UDN developper assistant which seems to say that it is not possible to use the zen loader in that context. See in French :

Can you confirm this is a valid usage for iostore / zen loader ? If you have any references on the usage, they will be welcome.

Also, my question regarding the lock granularity remains opened. Can you confirm what content will lead to the mutex acquisition and if possible suggest (or discourage) specific structures usage.

Thanks,

Basile

Hi Sebastian,

It has been more than 10 days since my initial reply.

Can you confirm wether the zen loader is accessible for content discovered at application runtime (i.e. not known / created when initial application gets cooked/shipped) ? So far, we could not “mount” data file using that approach.

Also, we need to understand the lock granularity and the items that are leading to the contention. According to our database areas, the issues are more or less visible and we hope that optimizing the content structure / layout could help us working around the issue…

Thanks,

Basile

Hi Basile,

Sebastian was sick for several days, we’re sorry for the delay on this.

For the lock contention, I’d try to look into how many assets end up inside a single node and try to reduce that amount.

Sebastian shared me this link that might be useful YouTube.

Also, if you’re using Packed Level Actors, some licensees also ran into this contention and had to tune the size of their PLA to reduce contention.

As for the content discovery, I’ll ask around for the iostore part of it… but zenloader also supports loading files from disk that are unpackaged. You can combine iostore plus loose cooked files if you activate s.LooseFileLoadingEnabled in your settings. It is behind a CVar because the check on disk for the package can reduce performance a little bit.

Hope this helps

Danny

Hi Danny,

I passed all the information to the team loading the content and looked into the video myself. I am not sure we are falling under long asset chanin and dependencies items. Our data is quite simple. Also all levels have the same dependency chain, only the content itself (occurences, positions, …) is changing.

Also the concern for now is frame loss at runtime, putting aside (for now !) garbage collection and memory footprint.

I am still expecting results / traces from the team but it seems that using level actor directly instead of creating unique BP is not showing up the faulty contention. That means that despite the data amount leading to all other possible problem, the data layout matters a lot and understanding the lock reason / scope would drastically help us building the content right.

Moreover, we are still struggling with the iostore / zenloader approaches. Is there any references / samples available for this ? All I found was aging from UE 4.27 and does not apply to the latest scheme we would like to put in place.

Last but not least, is there anything that changed on that front within Unreal Engine 5.7 ? We did not try it yet but if it proves to contain a major gain, we may consider to force a short term upgrade…

Thanks,

Basile

Hi Basile,

This is great news if getting rid of unique BP are not showing the same contention.

Can you describe what exactly you’re trying to put in place that zenloader doesn’t seem to support for you?

I’ve contacted the experts on the iostore side of things to look at this UDN, but maybe that part is already handled by your other team? I don’t want to end up creating 2 duplicate UDN threads for the same things.

I don’t think we did any major changes between 5.6 and 5.7 that would help you.

Thanks

Danny

Hi Basile,

For your iostore testing, have you switched to iostore for the main application? That is the first requirement. The application needs a global.utoc & .ucas for the runtime (IPlatformFilePak.cpp) to enable the zenloader/iostore loading path.

If you have a separate process today that produces a geocell .pak file, and you have tried switching that process to use iostore, then the same process is expected to produce a .pak/.utoc/.ucas triplet of files. All these files are required by the cooked/packaged runtime. If you copy all these three files to Content\Paks next to the global.utoc then the game should automatically mount them, and you should be able to load a package from the .utoc with zenloader by issueing the 'LoadPackageAsync longpackagename` console command.

If this does not work for you, please describe your existing process in more detail, and please attach pipeline and runtime logs from both your existing pak process and your iostore attempts.

Cheers,

PJ

Hi,

When I spoke with the engineer who investigated this, he mentioned the files were not mounted if not built at the same time as the application. I asked him to provide me with log files for sharing. However, he was out a couple days last week. We should get them this one.

Thanks,

Basile