New Core Feature: Async Framework (Master Branch and 4.8)

Hi all,

We started working on a new framework for asynchronous programming, which is located in the Core module. The goal is to simplify the process of writing code that executes asynchronously or in parallel. The Engine already had several mechanisms for this in place for a long time (threads, thread pool, task graph, etc.), but they required a fair amount of boilerplate code.

The first iteration of this new framework attempts to reduce the amount of boilerplate to an absolute minimum. The two most important additions are the Async<T>() function template and the TFuture<T> type. If you used Java, C# or the newer version of the C++ STL then you may already be familiar with these concepts.

What is Async<T>?

The new Async<T> template function allows you to execute functions on another thread without having to write a lot of boilerplate code. There are currently three different execution methods available: TaskGraph, Thread and ThreadPool. All three methods will execute your function asynchronously, allowing the function to run in parallel with the calling thread. Which method you chose depends on the nature of your asynchronous tasks. Functions that are executed through Async<T> will return immediately. This means that the result of the function is actually not available right away, and some mechanism is needed to retrieve the result at a later time. This mechanism is called a Future.

What is a Future?

A future is a variable whose value will be set in the future. If you write a function that returns, say, a TFuture<int> instead of just an int, then your function tells its users that the integer return value is not available right away, but it promises to set the value at some time in the future. While your function is computing the result, the caller can meanwhile work on other things, but as soon as the caller attempts to access the actual value of TFuture<int>, its thread will block until that value has actually been set by you. Futures therefore provide a low-level mechanism to return results from functions that are executed asynchronously.

When should I use Futures?

Futures are most useful when your code requires results from one or more other functions that execute asynchronously, or in parallel. For example, consider the following scenario where the function Foo() computes and returns a result using three other functions that are computationally expensive:


int Foo()
{
    int A = CalculateA();
    int B = CalculateB();
    int C = CalculateC();

    // other code here
    ....

    return A + B + C;
}

Note that the computation of the final result requires all of A, B and C. Traditionally, the order of execution would be sequential:


Thread 1:   |____CalculateA____|______CalculateB_____|___________CalculateC___________|__other code__|___A + B + C___|

With the Async() template function we are able to launch each of CalculateA, CalculateB and CalculateC asynchronously, which allows us to parallelize most of the work:


int Foo()
{
   Future<int> A = Async<int>(EAsyncExecution::Thread, CalculateA);
   Future<int> B = Async<int>(EAsyncExecution::Thread, CalculateB);
   Future<int> C = Async<int>(EAsyncExecution::Thread, CalculateC);

    // other code here
    ....

    return A.Get() + B.Get() + C.Get();
}

The order of execution now looks something like this:


Thread 1:  |___Async()___|__other code__|///sleep///|___A + B + C___]
Thread 2:  |____CalculateA____|
Thread 2:    |_______CalculateB______|
Thread 3:       |_____________CalculateC____________|

Note that, here the three functions do not return integer results, but futures that will eventually hold the results. When Foo() has completed all its work and goes on to compute the result, it will block until all of A, B and C are actually available (indicated by ā€œsleepā€ in the diagram above).

When should I NOT use Futures?

When your calling function does not actually care about the results of the asynchronous operations and does not need to block until the operations complete, you should not use futures. You can still execute such units of work asynchronously, and if some other code in your system needs to know about when they complete then it is generally better to use a mechanism using callbacks or delegates or instead. The key here is that the results of the async operations may be needed somewhere, but not in your calling code.

When should I use TaskGraph vs. Thread vs. ThreadPool for async execution?

Unreal Engine provides several means of parallelizing execution of tasks.

The TaskGraph is shared by many other systems in the Engine and is intended for small tasks that are very short-running, never block, and must complete as soon as possible. Launching graph tasks is very cheap as compared to starting up threads, but you must ensure that your code does not block the TaskGraph ever. In particular, you should not set up Async() functions on the TaskGraph that in turn create other Async<T>() calls or may wait on some external event.This is very important, because if all worker threads are waiting then nothing else gets done in the Engine. If your code may block or create other asynchronous calls then use Thread or ThreadPool instead.

Threads are quite expensive to create and best suited for long running tasks or tasks that may block. Operating systems generally impose limits on the number of threads that can be created, and they also slow down considerably once too many threads are alive at the same time. If you have many tasks (hundreds) or only want to maximize CPU utilization and do not care about all your tasks actually running in parallel at the same time, use ThreadPool instead.

The ThreadPool is another set of worker threads that is independent from the TaskGraph system. It allows you to queue up an arbitrary number of threads, which will then be completed one after another based on the availability of worker threads. If your tasks do not fit into either TaskGraph or Thread, then execute them here.

Note: A fourth mechanism for parallel execution, OS processes, is available in the Engine, but it is not exposed in Async<T>(). Use Thread or ThreadPool instead.

Does this mean my algorithms are parallel now?

No, futures and Async<T>() are low-level primitives that help reduce the boilerplate code required for asynchronous programming. They do not offer anything for automatically parallelizing your algorithms (although they may be used for a parallel programming library that we might implement in the future, but this is still pie in the sky).

Does this mean my algorithms are thread-safe now?

No, you are still responsible for ensuring that any code being executed asynchronously is completely thread-safe. Futures only guarantee thread-safety for the return values of your functions.

[HR][/HR]

Note: Examples of Async and TFuture can be found in /Runtime/Core/Tests/Async/AsyncTest.cpp. The implementation of Async itself also uses futures.

2 Likes

Some more implementation details for those who care:

Our implementation separates the read and write side of asynchronous results into two concepts: Futures and Promises. A Future is the object being returned to the caller. It can be used to retrieve a functions return value when it is needed at some time in the future. If the return value is not yet available, the calling thread will block (there is also an option to wait with a timeout). A Promise is used by the called function to write the result into the Future.

Futures and Promises cannot be copied - they can only be moved. If you move them from one instance to another, the old instance becomes invalid and can no longer be used to set or retrieve result values; this is an optimization. If you wish to share a Future between multiple threads, you can call TFuture.Share() to create a Shared Future. Shared futures are copyable, but do not support the more efficient move semantics.

The full implementation is in /Runtime/Core/Public/Async/Future.h. Please see the code documentation for further details.

For a general introduction to the Future/Promise pattern check out this Wikipedia article.

As a side note, if youā€™re using this feature be very aware of how C++ lambdas work with variable capture. Itā€™s very easy to capture a stack value by reference that you didnā€™t intend to that can lead to race conditions!

You can capture specific variables rather than all variables (my preference) and choose whether to capture them by-value or by-reference.

e.g., a simple case:


for (int i = 0; i < SomeStuff.Num(); ++i)
{
	auto Future = Async<int>(EAsyncExecution::ThreadPool, &]
	{
		// "i" has been captured by reference! By the time this thing runs on another thread who knows what it will be
		// We could instead use =] but that would copy the SomeStuff array which we probably don't want.
		// We could use [i,&SomeStuff] to capture i by value and SomeStuff by reference. That works as long as we guarantee that the array and its contents won't change while these async operations are in-flight.
		return DoSlowOperation(SomeStuff*);
	});
}

Thank you Gerke for taking the time to write up this extremely helpful information!

And thanks Nick P. for the additional info!

I canā€™t wait to try out the new Future/Async primitives!

:heart:

Rama

Very nice! Iā€™m excited to try these, need to set aside some time to seriously dive in. Several algorithms for things I want to do would be nicer in parallel, but I donā€™t have much experience with async in C++ so Iā€™ve been putting them off.

Thanks for the warning nick_p!

Nice Addition, will definitely try it asap!

Very nice! I love the async/await support in C# :slight_smile:

Very nice addition, giving more freedom on how we can use async primitives will let us do far more stuff!

Thanks!

Nice addition! Thanks for the info too.

I wonder, is there a mechanism for async work with non-blocking wait? I.e. an object ticked every frame that checks if its assigned thread/task is finished, and when thatā€™s finished, it executes some bound function and uses any data created/calculated by the thread/task to do work on the main thread. I.e. some sort of IO operation during gameplay, where you want to use the results when theyā€™re ready, but you donā€™t want to block the game thread with a future (since you have no idea how long this loading will take; the future could halt the main thread for many frames).

Unless I misunderstand, you should be able to accomplish what you want with the regular FAsyncTask.

Check out the code for sound decompression, it can be used in a similar manner https://github.com/EpicGames/UnrealEngine/blob/311e18ff369078e192a83f27834b45bdb288168a/Engine/Source/Runtime/Engine/Public/AudioDecompress.h

Thanks for the lead! FAsyncTask looks nice (Iā€™m used to using FRunnables, which have a bit more overhead and setup, so this is cool!)

This doesnā€™t handle the 2nd part, however - the execution of work on the main thread once the async portion has been completed. I can of course wrap this task with a tickable object and execute a function pointer once MyTask->IsDone() returns true, but I was hoping for something built-in. Even a function that is called on the main thread once the task has completed (and which I could override in a subclass), such as a virtual void OnTaskCompleted() function. Any ideas? :slight_smile:

Not sure, I donā€™t know enough about the C++ classes to be able to say whether thereā€™s something like that. There is a method to check if work is finished on the FAsyncTask, but then you do have to either wrap it in a ticked object or check it in certain intervals.

Iā€™d actually like to know if thereā€™s a way to set up an event for FAsyncTask myself as that would be a lot cleaner for some things Iā€™m doing as well.

We donā€™t have continuations yet, if thatā€™s what you mean, but itā€™s on the to-do list.

In the meantime, you could pass a delegate as a parameter to the async function, which the async function will execute when it is complete. Make sure that your delegate handler is thread-safe.

Is there somewhere the engine does this we could look at as an example? I get the concept, just havenā€™t really done much with delegates yet.

Thanks for the help :slight_smile:

Search for ā€œFSimpleDelegate::Createā€, for example, to see how we are passing delegates as a parameter to function calls. Another option would be to make the parameter a TFunction, which can accept function pointers and lambdas. It doesnā€™t allow for payloads, but looks cleaner. It really depends on your use case.

Awesome, thanks again, that should be plenty for what I need!

Awesome, thanks! Iā€™m passing a TFunction in now (shouldā€™ve thought of that). What do you mean re. ā€˜It doesnā€™t allow for payloadsā€™? Is it that it can only take static functions, and not those called as member functions on an object?

Additionally, how can I make sure that my FAsyncTask is destroyed? I want to fire it off with the delegate/TFunction bound, let it do its work, then have that delegate/TFunction execute. After that, the task would ideally be cleaned up. I presumed that the task pool/queue would destroy tasks when theyā€™ve been completed? This doesnā€™t seem to be the case, and obviously calling delete this from inside DoWork is not possible.

Actually, now that I think of itā€¦ how does one make sure that the delegate/TFunction is executed on the main thread? You say to make sure that the delegate handler is thread-safe, but Iā€™m afraid I canā€™t make the connection between that and this.

Thanks again.

Scrap that, I get it.

From my workerā€™s DoWork() function:



FSimpleDelegateGraphTask::CreateAndDispatchWhenReady
(
	FSimpleDelegateGraphTask::FDelegate::CreateStatic(&TestFunction,
m_loadedData),
	TStatId(),
	nullptr,
	ENamedThreads::GameThread
);

Where TestFunction returns void and takes in a TArray<MyDataStruct> argument (which is m_loadedData in this case).

What I canā€™t figure out is how to pass in or bind a delegate from outside of here, i.e. binding/passing a delegate of my own choosing to this worker from outside of it, so that the worker doesnā€™t need a pointer to the object and the function in question (which could be several steps removed by this point).

I.e.

MyCoreClass::StartLoading ---- (Delegate bound to MyCoreClass::OnDataLoaded) ----> MyDataLoader::LoadData ā†’ MyWorkerInstance.BindDelegateSomehow(//use the passed-in delegate).
Then when MyWorkerInstance (of MyWorker type) finishes its DoWork, it invokes the delegate and passes the TArray<MyDataStruct> back to MyCoreClass::OnDataLoaded. You can see why requiring that I have a pointer to a MyCoreClass instance and know of its functions at the point of the DoWork function is a bit of a problem - I donā€™t want everything in this chain to have to know about everything else!

EDIT:

After digging, and even trying to create my own DelegateGraphTask class, it seems that the delegate used MUST be of the type given by DECLARE_DELEGATE(FMyDelegateName), yet it can take bindings from functions with one or more arguments? I thought you needed DECLARE_DELEGATE_OneParam(ā€¦) etc for that!

Is there at least a way to get my own delegate type (i.e. a OneParam) in where this type is currently? A way to copy the delegate binding across to this accepted type, or to swap out the delegate? Iā€™d really not want to pass a pointer to the original object all the way down the callstack so that the delegate can be bound at the point of task creation. And having my public delegate type be DECLARE_DELEGATE gives users of my code no real indication of the format needed for whatever functions they choose to bind.

EDIT#2: Iā€™m an idiot.

The solution to the above was just to execute the passed-in/bound delegate inside the function that Iā€™m binding to the above task. I.e. the OnDataLoaded calls my delegate and passes the workerā€™s output through. All is well/solved :slight_smile:

@ - would you mind posting a complete example for future reference? Thanks bud :slight_smile: Glad you got it all working!