Download

Funny story, my C++ class is less than half as quick as its BP equivalent...

SOLVED: Missed a property: CollisionEnabled. It was set differently in my BP and native class… Appreciate all the help and advice.

TLDR: Super simple class connected in BP then written in C++, exactly the same, when run, the C++ class takes twice as much CPU time as the BP class. Performance & profiling is not the topic(fairly welcomed though…)

Hi, I was trying to mimic a fish horde in UE4, and when the number goes up, I pretty quickly hit the CPU game thread bottleneck. At 1500 fishes, I get 40+fps, barely, rendering once (well it was intended for my vive…) The game thread takes roughly 20ms which is also the frame time, according to “stat unit”.

Suddenly I remembered, some Epic employee posted that C++ could be 10x faster than BP, and I’ve been meaning to practise my coding skill, so I manually converted everything (really just a spawn volume and a fish…) to C++ and crossed my fingers and hit play… The spawning process, which was short and green (fps) with BP, went up to 100+ms/red… And when the fishes are spawned and swimming, the game thread goes up to 40~50ms, still matching the frame time, still the bottleneck.

Then I did a stat start/stopfile on both version and compared them, turns out in both version, “ProjectileMovementComponent” under WorldTick is taking the most time. In BP the ProjectileMovementComponent is taking 7.5ms while 18ms in C++; WorldTick takes 11ms in BP while 28ms in C++…

I guess I must be doing something wrong, but it’s so simple and same as the BP, I simply can’t find it… Here’s how my fish class works:
Basically it’s a homing projectile set to follow an empty actor which will be moved by a level sequencer.
A static mesh component attached to a scenecomponent which is the root. A ProjectileMovementComponent, of which the “IsHomingProjectile” is set to true and the homing magnitude is 200.f.
In the ctor, I set the MaxSpeed of the projectile to a random float between 600~1000.f.
In beginplay I do if (TargetActor) { ProjectileMovement->HomingTargetComponent = TargetActor->GetRootComponent(); }
But I don’t suppose these matter midgame… And in tick, I simply check the movementcomponent’s Velocity.Size() and SetVelocityInLocalSpace(FVector(200.f, 0.f, 0.f)) if that size is <= 10.f. I even set the static mesh in C++ for maximum efficiency…
The BP is setup exactly the same, and it’s just so simple that I can’t figure out why…

Seriously, any help is appreicated…

Have you looked into the Zen Garden demo? They showed off some tricks there (e.g. grouping) :slight_smile:
The project should be available for free on Epic’s Marketplace page.

Yeah I actually used projectile movement because of that… I’m not troubled by the performance itself, with the right amount any machine can be crushed… I’m curious about my C++ class performing way much slower than my BP class.

Would you mind posting some code so we can give a judge about why?

Thanks for helping! I seriously doubt anything else would matter so I’m posting my Tick and BeginPlay here:


void ATestFish::Tick( float DeltaTime )
{
	Super::Tick( DeltaTime );
	if (ProjectileMovement->Velocity.Size() <= 0.f)
	{
		ProjectileMovement->SetVelocityInLocalSpace(FVector(200.f, 0.f, 0.f));
	}
}


void ATestFish::BeginPlay()
{
	Super::BeginPlay();
	if (!bHasTarget)  //
	{
		ProjectileMovement->HomingTargetComponent = TargetActor->GetRootComponent();
		bHasTarget = true;
	}
}

I guess also worth mentioning is I have a USceneComponent as root, a UStaticMeshComponent attached to it, and a UProjectileMovementComponent. As I’ve stated above, it’s just so bloody simple that I can’t figure out what goes wrong… The BP is set exactly the same…

EDIT: in case you wonder the bHasTarget:


if (TargetActor) 
	{ 
		ProjectileMovement->HomingTargetComponent = TargetActor->GetRootComponent(); 
		bHasTarget = true;
	}

This is in my ctor. I use SpawnActorDeferred() to spawn fish so I can pass on the “TargetActor” reference. The BeginPlay code is more of a failsafe.

Hm I see nothing that would take a huge amount of cpu time in your examples.

Some hints maybe they are the root of all evil:

1.) do the fishes have collisions enabled?
2.) does every fish search a target every Tick ?
3.) do you any searches for objects or raycasts?
4.) when using raycasts anywhere try to limit the ray length, you usually don t have to cast a ray from one end to the other end of the scene

Bump/update: I can’t seem to replicate this in a new project, where the native class actualy wins by 30ms against BP’s 36ms, which is really a huge difference… How odd is this…

How about handling all the fishes into a single class? Would that be feasible ?

Thanks for helping. But really I’m not (that) worried by the amount of CPU time it takes. Thing is I’ve got 2 exact same class, one native C++ class and one BP class, and the BP class uses only half the CPU time compared to the C++ class…

As you can see the C++ code can be easily converted into a BP graph.

Also I have no idea whatsoever about your 2~4… Since all I did is set the target in UProjectileMovementComponent and tell it that it’s a homing projectile. How it follows the target is beyond me. What I do know is I have the collisions enabled and was hoping to see they bumping into each other.

They are instances of the same class…

If you mean to have a class that has a lot of (fish)meshes and controls them, I have not the slightest clue how to make that look good…

Yeah sorry what i mean is control the flocking or the behavior of the fish in the same body all at once, you might save up a lot of function calls, not sure how would exactly work to be honest , pretty new to unreal hear, but usually that s something you try to avoid in general in performance oriented code. Also because those are all virtual calls.

Thx for the help. I’m pretty new to programming… So do you mean, virtual calls are expensive and should be avoided when possible?

Wouldn’t say virtual calls are expensive (as in “OMG that’s expensive”), but they do require an extra look-up in the v-table when called. I’m pretty sure you’d have to do thousands of calls to find a measurable difference in the context of a gameloop. See: http://stackoverflow.com/questions/8776507/difference-between-calling-of-virtual-function-and-non-virtual-function

Note for instance that Tick() is a virtual method. If virtual calls were prohibitively costly, the UE architecture probably wouldn’t look the way it does.

Exactly what I was worried about, your last sentence…

Do you have any idea why BP is running way faster than C++? They are really the exact same… Could it be something in the project setting?

No idea, though I’ve only glanced at it, and it’s sort of difficult to do “remote performance analysis” without having the stuff available for hands-on experimentation.

The first thing that popped into my mind however was that I believe Blueprints do some clever caching of stuff, when applicable, so there’s more going on behind the scene than what your BP graph might suggest. And without knowing exactly what that is, there’s no way you could replicate it in your native code. I’m not an expert on the inner workings of the blueprint VM, so I can only guess, and that wouldn’t be helpful.

Just as a general tip, you most likely don’t want to put function calls inside of a code that is iterative.



void ATestFish::Tick( float DeltaTime )
{
	Super::Tick( DeltaTime );
	if (**ProjectileMovement->Velocity.Size()** <= 0.f)
	{
		ProjectileMovement->SetVelocityInLocalSpace(FVector(200.f, 0.f, 0.f));
	}
}


The Tick function calls that Velocity.Size() method every second. It’s as inefficient as this code:



vector<double> vec{1.3,34.3,0.2,12...........};  // suppose vec size is 10000

for(int i=0; i<vec.size(); i++)
{
     // process the vector element vec*
}


which can be optimized in



vector<double> vec{1.3,34.3,0.2,12...........};  // suppose vec size is 10000

size_t size = vec.size();

for(int i=0; i<size; i++)
{
     // process the vector element vec*
}


I see what you mean but I don’t believe it’s quite the same thing here.

I’m doing this because the velocity changes every tick. and it’s an if so it executes once (as far as my understanding goes)

What target are you building your native class for, debug, development, shipping? This has an effect on the optimizations done during the build process.
Same question goes for the engine you’re using and whether it’s a custom build from GitHub source or a binary version from the Epic Games Launcher?

One thing you could optimize is the square root in Velocity.Size() using Velocity.SizeSquared() instead will save the “expensive” square root and since you’re comparing against 0.0f you don’t even have to change that side. Note that this wouldn’t be exactly equivalent to your BP anymore.

Also http://stackoverflow.com/questions/449827/virtual-functions-and-performance-c

I doubt the virtual function overhead is the problem; As UnrealEverything suggested maybe try different build targets, after that you may need to profile to find the culpirit

Thanks for solving that mystery that lingered for so long in my head: wtf is SizeSquared() for…

Anyway I was able to find the problem: turns out I was so “clever” when making the BP that I just set the CollisionEnabled to PhysicsOnly. Once I did that on the native class, the code turned the tide and won by 5ms.