No performance benefit from instanced meshes?

I’ve been tinkering with instanced static meshes, as I intend to later spawn run-time procedurally generated environment content using a large series of tiles. I’d have a number of different blueprint objects (one to ten or more), each would contain a number of mesh tiles (ranging from around 30-300 tiles). Since a large number of these tiles are identical, it makes sense to use ISMs.

I tested today the following:

A blueprint that spawns 700 static mesh actors.

A blueprint that spawns 700 actors with an ISM component.

A blueprint that creates 700 SM components for itself at run time.

A blueprint that creates 700 ISM components for itself at run time.

All of the above where every mesh has a unique MID.

All eight tests yielded approximately the same performance; the same framerate of around 60fps and approximately 1000 draw calls (300 base calls with 1 draw call per visible mesh component).

What gives, is it broken? Otherwise, what needs to be done to actually take advantage of mesh instancing?

In fact, ISMs seem to be completely broken:

On the left are over 4000 static mesh actors spawned at run time - 40fps.
On the right is one actor with over 4000 instanced mesh components created at run time - 2.6fps.

Both sets of meshes have the same properties. Something is very wrong in the uniform buffer - it’s update time skyrockets to nearly 10ms from 0.04ms when using instanced meshes!

Heya Luke,

Going off the wording you’re using, and not seeing any of your Blueprints/Code: you need to “add instance” to an instance static mesh component, not create more instance static mesh components, ex:

addinstance.png

Each static mesh component can only be one mesh, but they’re all one draw call.

Thanks Ian - I was using ‘AddInstance’, but I’ve clocked now that I was creating a new mesh component each time, then adding a corresponding instance, when I should only have one component and then add the instances.

I now have over 4000 meshes and only 300 draw calls, which was my aim.

I am still not getting an performance benefit though - the frame rate is a lot better, but still noticeably poorer than just using individual static mesh actors, when it should either be the same or better?

Have you tried doing this from C++?

I got very noticeable fps improvement using instanced static mesh component in the c++

When I was doing my vertex drawing experiments,

drawing 4000 separate static meshes was slowing my computer way way down, but using instanced I was still basically near max

#Material With Translucency?

Does the material you are testing with have any goofy opacity or translucency or masked properties?

When those overlap on screen I get a lot of slow down :slight_smile:

#C++ side of things

you just add a instanced static mesh component to a class of your choosing, and then use the function

UFUNCTION(BlueprintCallable, Category="Mesh")
virtual void AddInstance(const FTransform& InstanceTransform);

same as the c++

you can the move an instance with the original using:

virtual void ApplyWorldOffset(const FVector& InOffset, bool bWorldShift) OVERRIDE;

something not available from BP and very necessary for my vertex drawing texts

Try to run gpuprofile on both configurations (instances and normal meshes), and see if you can narrow bottleneck this way.

I’m honestly not sure if that is normal behaviour. Instancing, is really used only for two things:

  1. Reducing draw calls.
  2. Reducing memory footprint (as you don’t need to store entire mesh information, instances only need informations about transformations).

The decreased framerate might be here, because while the draw calls footprint is reduced, CPU must spend more time, preparing batches of static meshes before sending them to GPU.

4000 might be just to small amount of meshes to get huge benefit from it. (I know it sounds ridiculous).

But keep in mind one thing. You now might add more draw call heavy elements to your level, and perfomance should still be good. While if you had non-instanced meshes, you would be far more limited as you would have less free draw calls to use.

Take it with grain of slat, as I’m not sure how instancing in UE4 is supposed to work.

“4000 might be just to small amount of meshes to get huge benefit from it”

but,

in my picture just above,

I am saying that I did get huge benefit from comparing just 4000 instanced vs individual static meshes :slight_smile:

I dont have the numbers any more, but it was basically the difference between unusable-ly laggy to entirely playable :slight_smile:

I am still wondering if materials factor into this at all

Rama

Hey Lukasz, where can I find gpuprofile? I did search the UDN but only found two mentions of it - both by you. :slight_smile:

Start game either in editor, as standalone process or from editor in separate window (though stadalone is best option), then press ~ to open console type gpuprofile :wink:

the actual command is actually profilegpu, not gpuprofile

in game,

you go to console

and type

profilegpu

it opens a nice awesome amazing menu in game :slight_smile:

I bound mine to CTRL + SHIFT + Period in player controller class

//GPU
	if(CTRLDown() && SHIFTDown() && WasInputKeyJustPressed(EKeys::Period) )
	{
		ConsoleCommand("profilegpu");
	}

I get the feeling it’s a quirk of the system more than anything else; some sort of additional overhead in batching instanced meshes that affords a small rendering penalty. I did successfully render 200,000 cones (it’s using the default material FYI) using instancing and the frame rate barely dropped even though we’re talking about 35 million triangles, so it must be doing something right!