When and where to allow oversubscription

We ran into a rare hang on our dedicated server that involved the game thread waiting on FlushAsyncLoading, the loading thread waiting on a decompression task, the decompression task waiting for the background worker thread, and the background worker waiting on the game thread. This is mostly a problem on the server since there are few threads available.

I have a fix for this (set s.IoDispatcherForceSynchronousScatter=1), but observed that it happened in part because the stalled task never gave up the worker thread. We’ve considered using LowLevelTasks::FOversubscriptionScope to allow this to happen--to avoid similar hangs, and maybe improve performance in general--but had a couple questions.

One of the hangs involved an FMutex on the worker thread. It looks like the mutex implementation explicitly disallows oversubscription:

// Do not enter oversubscription during a wait on a mutex since the wait is generally too short
// for it to matter and it can worsen performance a lot for heavily contended locks.
LowLevelTasks::Private::FOversubscriptionAllowedScope _(false);

We use mutexes probably a lot more than you have, for various reasons. We don’t want these blocking other tasks while the worker is stalled. Do you think it would be reasonable at all to allow oversubscription here?

Another hang involved some custom wait code in a plugin. I tried adding the oversubscription scope there, but it failed to link on editor (non-monolithic) builds:

1>18>AtomicWait.cpp.obj : error LNK2001: unresolved external symbol "private: static bool LowLevelTasks::Private::FOversubscriptionTls::bIsOversubscriptionAllowed" (?bIsOversubscriptionAllowed@FOversubscriptionTls@Private@LowLevelTasks@@0_NA)

This might be related to this variable being thread_local so it can’t be exported. I didn’t see any other uses of FOvesubscriptionScope in the engine outside Core (or platform-specific code that usually builds monolithic). Is FOvesubscriptionScope meant to be available to plugins?

[Attachment Removed]

Hi Kevin,

What you describe is kind of a circular reference when GT waits on worker and worker waits on GT. Those would deadlock even if you have enough threads.

The IoDispatcher issue was fixed in time for 5.6 apparently in 35174606. I wonder why it is still a problem for you. What this fixes is that decompression task will be handled inline if all workers are busy. You get the same benefits as s.IoDispatcherForceSynchronousScatter=1 when you’re starving. But you don’t get the performance drop for the normal case since tasks are still used.

If you know you have long mutexes, manually allowing oversubscription is OK. Looks like LowLevelTasks::FOversubscriptionScope should work whether or not you are in modular or monolithic. Is it what you tried to use?

Just as a side note. Ideally, nothing would wait on the taskgraph, you would split tasks where you need to wait, and schedule another task when the result you would wait on would be ready. This eliminates oversubscription and you get the best performance with less preemption.

Oversubscription remains a valid choice for hard to fix legacy problems.

Hope this helps

Danny

[Attachment Removed]

I don’t understand why you have problems with the thread-local. You can just remove it from the include file and put it in the cpp file. Modify GetIsOversubscriptionAllowedRef to give you the ref from the cpp file. That should fix any issue.

[Attachment Removed]

Thanks Kevin, I’ll get this sorted out on our side for 5.8.

All the best

Danny

[Attachment Removed]

Hi Kevin,

I was trying to figure out and repro why you would get such an error and I can’t find it. You should definitely not be using CORE_API on that thread_local tho. Nothing touches the thread_local but LowLevelTasks::Private::FOversubscriptionTls::IsOversubscriptionAllowed(), which is meant to be used by the scheduler only and is performance critical, this is why I left it as an inline function.

I was able to use both of these scopes in plugins with Clang 20, and MSVC Using Visual Studio 2022 14.44.35224 toolchain (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207)

LowLevelTasks::FOversubscriptionScope _;

LowLevelTasks::Private::FOversubscriptionAllowedScope MyScope(false);

Let me know if you find out why the thread_local is giving you headache on your side.

Thanks Danny

[Attachment Removed]

Ok I’ll close this then.

Thanks

Danny

[Attachment Removed]

Thanks. To clarify, the only reason the game thread is waiting on the task graph is because it called LoadSynchronous, which required decompression tasks to run. s.IoDispatcherForceSynchronousScatter=1 appears to break that dependency. We are also aiming to remove LoadSynchronous from the game thread wherever we can.

I think we’re not getting the benefit of 35174606 because it checks FileIoStoreImpl::IsSchedulerOversubscribed, and it doesn’t show as oversubscribed because our waits (e.g. FMutex) don’t use FOversubscriptionScope. I think I will try adding that to FMutex and seeing how it affects performance, though.

I did try to add LowLevelTasks::FOversubscriptionScope in a plugin as well, and the editor build failed with the above link error. It worked on monolithic game builds.

[Attachment Removed]

I experimented with LowLevelTasks::FOversubscriptionScope in a plugin a bit more, and I was able to get it to link by modifying Private::FOversubscriptionTls::bIsOversubscriptionAllowed. I had to add CORE_API and remove thread_local (as you can’t export a thread local member). Obviously the lack of thread_local is going to change the behavior so I didn’t try actually running that way, but that seems to be the problem with this.

For now I think I’m going to use #if IS_MONOLITHIC so I can at least use it on game builds, but ideally I’d like to keep the behavior consistent.

[Attachment Removed]

Thanks, moving this out of the class and directly into the namespace in the CPP file got it working. For reference, this is the error I got trying to export the member while it had thread_local:

<root>\ue5\Engine\Source\Runtime\Core\Public\Async\Fundamental\Scheduler.h(302,38): error C2492: 'LowLevelTasks::Private::FOversubscriptionTls::bIsOversubscriptionAllowed': data with thread storage duration may not have dll interface
1>			CORE_API static thread_local bool bIsOversubscriptionAllowed;

[Attachment Removed]

We’re using the VS 2026 compiler (Using Visual Studio 2022 14.50.35725 toolchain (C:\Program Files (x86)\Microsoft Visual Studio\18\BuildTools\VC\Tools\MSVC\14.50.35717)), so maybe that’s a difference.

I did get it working by moving bIsOversubscriptionAllowed and the IsOversubscriptionAllowed() definition into the CPP file, and haven’t seen any noticeable performance impact, so that seems to be enough to get this working for us.

[Attachment Removed]