ParallelFor Optimization

Hello all! I hope you all are doing well. I have some optimization questions I was hoping I could get answered, particularly about ParallelFor.

Right now, I am seeing some strange behavior where ParallelFor seems to be decreasing my performance instead of helping it. I have attached a video where you can see the comparison on stat Game

So I am working on boids flocking behavior, and I would like to implement a ParallelFor when each boid is sampling other members of the flock to improve performance through multithreading. This is my first attempt at multithreading in Unreal though I have done it before although only a little bit. Here is the relevant code.

Ordinary For Loop:

for (AActor* boid : Flock)
{
	if (boid == GetOwner())
	{
		continue;
	}

	UBoidsMovementComponent* boidMovementComponent = boid->GetComponentByClass<UBoidsMovementComponent>();
	if (IsValid(boidMovementComponent))
	{
		// calculate distance
		FVector boidPos = boidMovementComponent->UpdatedComponent->GetComponentLocation();
		FVector currentPos = UpdatedComponent->GetComponentLocation();
		float distanceFromBoid = FMath::Max(FVector::Dist(boidPos, currentPos), 0.001f);

		// is the boid within the protected range? 
		if (distanceFromBoid < ProtectedRange)
		{
			// Seperation
			FVector sepVector = ((currentPos - boidPos).GetSafeNormal()) / distanceFromBoid;
			cumulativeSeparation += sepVector;

			numProtectedBoids++;
		}

		// is the boid within the visual range? 
		if (distanceFromBoid < VisualRange)
		{
			// Alignment
			velocitiesSum += boidMovementComponent->Velocity / distanceFromBoid;
			
			// Cohesion
			positionsSum += boidPos / distanceFromBoid;

			numVisualBoids++;
		}

	}
}

ParallelFor:

ParallelFor(Flock.Num(), [&](int32 i)
	{
		AActor* boid = Flock[i];

		if (boid != GetOwner())
		{
			UBoidsMovementComponent* boidMovementComponent = boid->GetComponentByClass<UBoidsMovementComponent>();
			if (IsValid(boidMovementComponent))
			{
				// calculate distance
				FVector boidPos = boidMovementComponent->UpdatedComponent->GetComponentLocation();
				FVector currentPos = UpdatedComponent->GetComponentLocation();
				float distanceFromBoid = FMath::Max(FVector::Dist(boidPos, currentPos), 0.001f);

				// is the boid within the protected range? 
				if (distanceFromBoid < ProtectedRange)
				{
					// Seperation
					FVector sepVector = ((currentPos - boidPos).GetSafeNormal()) / distanceFromBoid;
					cumulativeSeparation += sepVector;

					numProtectedBoids++;
				}

				// is the boid within the visual range? 
				if (distanceFromBoid < VisualRange)
				{
					// Alignment
					velocitiesSum += boidMovementComponent->Velocity / distanceFromBoid;

					// Cohesion
					positionsSum += boidPos / distanceFromBoid;

					numVisualBoids++;
				}

			}
		}
	},
	false);

Is there an important point about ParallelFor I am missing? One thing I was thinking was that maybe this is calling too many items at once? Is there another better option for my use case I should look at? Thank you so much for taking the time to read this!

There’s typically overhead of locking, spawning threads and so on that won’t net you the theoretical gain of x times performance improvement simply by using x threads. So most of the time there is some sweet spot where multiple threads become faster if the amount of data you need to process gets sufficiently large. Your example probably isn’t large enough with ~200ish actors.

If you want more accurate data, I’d suggest using the profiler and possibly adding your own STATs rather than just looking at the overall Tick Time.

1 Like

I see that makes sense, I might try some more testing with larger amounts and the profiler. Thank you for the answer! :smiley:

I don’t see the point of using concurrency/parallelism. There are no heavy calculations here.