AsyncTask slower in packaged game when using TArray

Hi,

I’m using a lot of AsyncTasks (FNonAbandonableTask) to procedurally generate the map data when launching the level. In editor (whether it’s PIE or standalone) the generation is taking approximately 8 seconds. Once the project has been packaged, without any change (I tried developpment and shipping configurations), the generation takes 46 seconds (almost 6 times slower! - even worse, it is 20 seconds slower than the generation without multithreading!). If anything, I would expect the packaged build to be slightly faster.
I also noticed this difference when I was using FRunnable. I found this answerhub question (without answer) which seems to have a similar issue with FRunnable: https://answers.unrealengine.com/questions/982339/low-performance-after-packaging-1.html

After lots of debugging, I figured out the culprit was TArray: when you use the Add (or Emplace) function, the async tasks get very slow. Is this expected? Are we only supposed to use TArray in the game thread?
I managed to reproduce the issue in a blank project (Third Person Example). Here are the steps, using 4.25:

  1. Create a new Third Person C++ project called “MyProject”,
  2. Create a new C++ class child of Actor called “MyActor”,
  3. Open the solution in Visual Studio,
  4. Replace MyActor.h by the code provided below,
  5. Replace MyActor.cpp by the code provided below,
  6. Compile,
  7. In Unreal Engine, go inside ThirdPersonCPP>Blueprints>ThirdPersonCharacter,
  8. In the graph, bind the Enter key to a SpawnActorFromClass node (spawning a MyActor actor),
  9. Play In Editor and press Enter to launch async tasks: the total ms is printed on screen when the tasks are done,
  10. Now package the game and play it: the tasks are now very slow (on my computer it takes almost 3 seconds whereas it is instant in PIE).

The async taks are simply creating a local TArray in a loop and adding one value. If you comment out the line adding an entry to the TArray (line 22 inside MyActor.cpp), you get the same expected result between PIE and packaged.

MyActor.h

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once

#include "CoreMinimal.h"
#include "GameFramework/Actor.h"
#include "Async/AsyncWork.h"
#include "MyActor.generated.h"

class FMyTask : public FNonAbandonableTask
{
	friend class FAsyncTask<FMyTask>;

protected:

	void DoWork();

	FORCEINLINE TStatId GetStatId() const { RETURN_QUICK_DECLARE_CYCLE_STAT(FMyTask, STATGROUP_ThreadPoolAsyncTasks); }

public:

	FMyTask();

	float Time = 0.0f;
};

UCLASS()
class MYPROJECT_API AMyActor : public AActor
{
	GENERATED_BODY()

public:

	UPROPERTY(BlueprintReadWrite)
		float FinalTime = 0.0f;

	UPROPERTY(BlueprintReadWrite)
		bool bIsDone = false;

private:

	TArray <FAsyncTask<FMyTask>*> Tasks;

	TArray <float> WorkTime;

public:
	// Sets default values for this actor's properties
	AMyActor();

protected:
	// Called when the game starts or when spawned
	virtual void BeginPlay() override;

public:
	// Called every frame
	virtual void Tick(float DeltaTime) override;

};

MyActor.cpp

// Fill out your copyright notice in the Description page of Project Settings.


#include "MyActor.h"
#include "Async/Async.h"
#include <chrono>

using namespace std::chrono;

FMyTask::FMyTask()
{

}

void FMyTask::DoWork()
{
	auto start = high_resolution_clock::now();

	for (int32 i = 0; i < 5000; ++i)
	{
		TArray <float> Biomes;
		Biomes.Add(1.0f);
	}

	auto stop = high_resolution_clock::now();
	auto duration = duration_cast<milliseconds>(stop - start);
	Time = duration.count();
}

// Sets default values
AMyActor::AMyActor()
{
	// Set this actor to call Tick() every frame.  You can turn this off to improve performance if you don't need it.
	PrimaryActorTick.bCanEverTick = true;

}

// Called when the game starts or when spawned
void AMyActor::BeginPlay()
{
	Super::BeginPlay();

	for (int32 i = 0; i < 1000; ++i)
	{
		FAsyncTask<FMyTask>* Task = new FAsyncTask<FMyTask>();
		Tasks.Emplace(Task);
		Task->StartBackgroundTask();
	}

	GEngine->AddOnScreenDebugMessage(-1, 10.0f, FColor::Red, TEXT("Tasks launched..."));

}

// Called every frame
void AMyActor::Tick(float DeltaTime)
{
	Super::Tick(DeltaTime);

	if (!bIsDone)
	{
		for (int32 i = Tasks.Num() - 1; i >= 0; --i)
		{
			FAsyncTask<FMyTask>* Task = Tasks[i];
			if (Task && Task->IsDone())
			{
				WorkTime.Emplace(Task->GetTask().Time);
				delete Task;
				Tasks.RemoveAtSwap(i);
			}
		}

		if (Tasks.Num() == 0)
		{
			bIsDone = true;
			float Sum = 0.0f;
			for (float T : WorkTime) Sum += T;
			FinalTime = Sum / WorkTime.Num();
			GEngine->AddOnScreenDebugMessage(-1, 60.0f, FColor::Orange, *FString::SanitizeFloat(FinalTime));
		}
	}

}

Is anybody able to reproduce the issue?

I’m currently going through the same issue. I’m procedurally generating my world with the Runtime Mesh Component (from a plugin), and using async / FNonAbandonableTask to speed up the process.

Depending on how large of a world I generate, the loading time can be 3 - 15x slower in the packaged build,

I’ve attached an image from Unreal Insights, showing the time taken when running my project in the editor, compared to the packaged build