How does the engine do work? (Under the hood; the worker threads, task balancing)

HateDread · February 23, 2015, 10:40pm

I originally posted this in the Engine Source forum, but have now realised that this place may have been a better section.

"Hey guys,

I’m wondering about some of the action under the hood that would take quite a while to read through and understand (happy to do it, but just thought I’d ask here first / as well).

For example;

What does each thread that the engine spawns actually do? Worker threads, pool threads. (If you ignore the OpenCL Worker, the OpenCL threads, and the Object Sorter threads, this is still a lot. What do they all do? http://i.gyazo.com/3f9c372a8d7ef3de47e1376f76e220fa.png)
I’ve seen a lot of talk about task schedulers and task-based work being scalable to thread counts, since a dynamic system and/or library (i.e. in OpenMP) can allocate the ‘tasks’ to threads as needed, no matter how many there are. Is this how the UE4 TaskGraph system works, or is that separate? Is there a task-based system in place? What can we do with it or modify in it?
Further to the above; how is it suggested that significant workloads be organized? If it were a small engine done in-house I could imagine deciding to split everything into tasks where possible and writing a scheduler/load balancer, but it’s a bit harder when you’re staring a huge engine in the face. Alternatively one could create a thread for each significant subsystem, but we’re already talking way more threads than the CPU has cores, and it can’t scale in the future.

Any insight would be awesome.

Thanks,
."

Slayemin · February 23, 2015, 11:11pm

A lot of those threads are operating system calls (such as NTDLL). Those can be ignored.

As far as threads go, the operating system is usually what manages the task scheduling. Usually you don’t want to mess around with that. You could modify thread priority, but keep in mind that uses a tiered round-robin approach at the OS level and will cause lower priority threads to block until higher level threads have finished their work. This can be bad if your thread sucks up all of the processing time and causes lower level threads to starve. So, my two cents: don’t worry about it.

If I was doing multithreading in my own engine (which I built), I would want to use multithreads for things that are large parallelizable processing tasks or need to be done asynchronously (such as networking). You could also create a separate thread for different discrete parts of your game, such that you have one thread dedicated to processing physics, one for artificial intelligence, one for game state handling, file IO, etc. Basically, if a task shouldn’t block execution, it can probably be multithreaded. Multithreading comes with a slight overhead cost though (marshalling, thread syncronization, context switching, etc), so having a thousand threads isn’t necessarily better than having a few.

A “Thread pool” is a collection of worker threads waiting to be dispatched to do some random work which the thread pool manager doles out. This can be something like, “Hey, go handle this particle system and update all the particle positions!” (though, that would be better done with a custom vertex and pixel shader, but this is just an example anyways). Once the thread has been dispatched, it’ll do its work until its finished and then report back to the thread pool manager as being ready for more work. Usually, you don’t need to mess around with this stuff either unless you’re writing low level engine code.

As a UE4 developer, do you really need to care? Probably not. At least, I haven’t found a reason to yet.

HateDread · February 24, 2015, 2:05am

Oh I wouldn’t ever try to modify the priorities like that

Slayemin;227794:

If I was doing multithreading in my own engine (which I built), I would want to use multithreads for things that are large parallelizable processing tasks or need to be done asynchronously (such as networking). You could also create a separate thread for different discrete parts of your game, such that you have one thread dedicated to processing physics, one for artificial intelligence, one for game state handling, file IO, etc. Basically, if a task shouldn’t block execution, it can probably be multithreaded. Multithreading comes with a slight overhead cost though (marshalling, thread syncronization, context switching, etc), so having a thousand threads isn’t necessarily better than having a few.

A “Thread pool” is a collection of worker threads waiting to be dispatched to do some random work which the thread pool manager doles out. This can be something like, “Hey, go handle this particle system and update all the particle positions!” (though, that would be better done with a custom vertex and pixel shader, but this is just an example anyways). Once the thread has been dispatched, it’ll do its work until its finished and then report back to the thread pool manager as being ready for more work. Usually, you don’t need to mess around with this stuff either unless you’re writing low level engine code.

As a UE4 developer, do you really need to care? Probably not. At least, I haven’t found a reason to yet.

I’m somewhat versed in the terminology and ideas, generally speaking (thread pools, the OS being in charge of threads, overhead). Are you saying that UE4 basically uses its own thread pooling solution? I wonder which parts are sent to the task system and what are run on the game thread - I presume all Tick functions are executed sequentially on the main thread? To that end, what kind of tasks does the engine already do using the object pooling threads and the task threads? It’d be nice to know how much work it’s trying to do already, without extra stuff on top.

I understand your point re. a thread per subsystem, but in my travels I had heard of the ‘modern’ alternative; a job-based architecture that allows for scalability beyond hard-set thread counts, i.e. like using Intel TBB (which UE4 uses for its allocations, funnily enough). For our project we’re visiting the possibility of writing a layer on top of UE4 and having our own kind of ‘objects’ - instead of AActors - that can be updated in this or similar ways for our own specific needs (many thousands of ‘actors’ with a lot of expensive functions in each), but I needed a better grasp of how things are organized before I begin the madness.

Either way, I appreciate the response!