Download

Understanding performance bottlenecks?

I’m struggling to understand where the bottlenecks are when my game is under heavy load and running slowly.

On a map with thousands of player-placed structures, the frame-rate will drop. However when I run Msi Afterburner and monitor my system resources, it doesn’t show any of the hardware under any particularly heavy load:

https://content.screencast.com/users/coldscooter/folders/Jing/media/0bafbbf0-7a70-4a1a-a893-dbe8d56c99e7/2020-02-14_1226.png

This was captured on a very crowded map, looking out over a large distance with a lot of visible meshes.

However the GPU is only running at 30% and the CPU threads seem to have a lot of head-room. But as you can see the game is running at 35fps (which would be >100fps if I deleted all the player-placed structures on the map).

So what is causing the slowdown in the engine? Any ideas on how to better inspect where the bottlenecks might be?

Are those 10 gb ram what is used currently in the game or is what the machine has in total?(how much ram do you have)
Does stat engine show anything strange?

A little out of topic, but it would be cool if we can spawn or use dynamic cull distance volumes, but the default udk one is static.Tried it before and failed.

@O_and_N I’ve tried running “stat engine”, but the results don’t show anything obvious:

This is when the game is running slow at 30fps (with lots of meshes placed on the map):

https://content.screencast.com/users/coldscooter/folders/Jing/media/e6b1c254-3393-46ba-a200-49b82eaeb345/2020-02-17_1150.png

Then this is with all o the player placed meshes removed (and then the game is running fine at 60+fps):

https://content.screencast.com/users/coldscooter/folders/Jing/media/fd19439f-4036-42d3-8db8-014c59cc565a/2020-02-17_1151.png

I just figure out exactly where the performance is being eaten up, and why my CPU and GPU don’t seem to be under much stress when the game is running slow.

Here is a side-by-side comparison of the hardware stats when the game is running slow and when it’s running normal:

https://content.screencast.com/users/coldscooter/folders/Jing/media/d9c90a5d-b45d-48df-b363-e70d111cd473/2020-02-17_1203.png

As you can see there is basically no difference in terms of stress on the hardware between the two, except one is running at half the speed.

So the performance bottleneck is happening somewhere in the engine.

Any ideas?

I havent played your game(have been meaning to :wink: and will).
By ‘‘player placed meshes’’ are we talking about items, buildings? Are they skeletal meshes or interpactors?

I remember once that i made a building composed of 100 interpactors(stupid of me) and was animated to lift up from the ground as a elevator.Performance was killed as udk coudnt handle so many interpactors animated at once.Than i exported the building to maya>combined everything>and ended up with a building of 5 big chunks and the performance was fixed.Not sure if this relates here but its good to share the experience just to remove one thing from the problem.

So that stunt was a engine limitation.
Why is the Skel mesh tris so much higher in the scene with removed meshes vs the populated scene,different camera angle looking at other things?

@O_and_N Yes they are building pieces (walls, ceilings, foundations, etc). The are essentially Actors with a static mesh component.

You can see an example of the types of modular structures you can build in this video i posted last night: https://youtu.be/Z_OWNkiorQI

UE3 is not fully multithreaded (it uses some threads but not all. no game engine really is AFAIK), and unrealscript is single-threaded, so don’t rely on your cpu threads for guidance.

start by running ‘stat unit’ and see if your game is bottlenecked by the cpu (Draw means cpu rendering thread[s], Game means the cpu game thread[s]) or the gpu (simply GPU)
if the gpu is the main problem then run a ‘profilegpu’ and check the log, you’ll see the timings of your gpu processes on that frame
if it’s the cpu then check around all the other stats like stat scenerendering, stat octree, stat physics and so on (there’s many of them, you can easily find and toggle them if you run the game with the remotecontrol) and then obviously stat game. if the cpu is the bottleneck you’ll need to understand if it’s your game code (stat game) or some other engine cpu thing like physics, occlusion, particle processing, etc.
if you find that your game code is a major blocker you can dive deeper using the gameplay profiler (look in the docs for that, basically ‘profilegame 3’ will capture 3 seconds of gameplay into a profile, then open it with the gameplay profiler tool), and check specifically the code functions that might cause problems (and even get to optimize some things you didn’t know could be slowing down your game a bit)

given that your game is slowed down by player structures my guess is your bottleneck will be on the rendering thread, but no better way to know until you find out for sure :slight_smile:

Thanks for the input @Chosker. I’ve run some further comparisons between the game world with lots of player-built structures and one with none:

https://content.screencast.com/users/coldscooter/folders/Jing/media/ba42a43b-ad87-4712-a51d-6dbc4c81e102/2020-02-18_0937.png

With “stat unit”, while all the stats a dropped, the most significant deficit appears to be the “draw” (cpu rendering thread).

stat scenerendering:

https://content.screencast.com/users/coldscooter/folders/Jing/media/236f84bd-4706-4e79-895e-03681975eed4/2020-02-18_1008.png

stat octree:

https://content.screencast.com/users/coldscooter/folders/Jing/media/81a7db71-5773-4c15-b686-e69598ec5e2d/2020-02-18_1011.png

stat game:

https://content.screencast.com/users/coldscooter/folders/Jing/media/db002976-7004-46ca-b235-9c9fa5c22cab/2020-02-18_1013.png

So i’m still analysing the stats, but it does appear to be the number of draw calls on the CPU, coupled with the number of shadow and lighting drawing.

There also seems to be a drastic increase in “Dynamic path draw calls” in the scene rendering stats. What is this?

Posting these stats in the hope someone may be able to shed more light (no pun intended) on them, and perhaps suggest some ideas of how to optimise.

Cheers

Today I tested Voice acting for my game for 100 AI and it took a lot of resources The Sound Cue System.

seems you have a little bit of different things adding up. simply put, Unreal isn’t super efficient at rendering large amounts of dynamic actors (which seems reasonable).
some of the obvious solutions would be simplifying what the player can build. more granularity means more freedom for players but also means more rendered meshes. I don’t know if your granularity is at walls/doors/etc level, entire rooms, or even entire houses/structures. but in general reducing the granularity would reduce the problem.
in my game with spawned dungeons I had fully modular level assets, but for some very recurrent cases I had versions of those meshes attached at the art pipeline level. for example I would replace 4 tiles of 1x1 floors with a pre-merged 4x4 floor. in your case being user-built it’s much more tricky but it might give you some ideas.

the dynamic rendering path (movable actors) has a higher cost than the static rendering path (static actors). spawning dynamicSMActor static meshes makes them being drawn on the dynamic rendering path.
in my spawned dungeons I had a StaticMeshActorSpawnable class child of DynamicSMActor_Spawnable, and in the defaultproperties I had


bWorldGeometry=true
bTickIsDisabled=true
bMovable=false

give it a try, it might help making your spawned meshes a bit lighter in a few categories.

you also have a lot more shadow and lighting drawing. probably normal for shadows if your objects cast dynamic shadows, but the lighting part seems strange. if you have dynamic lights with shadows spawned on your player structures you probably want to disable them by distance.

lastly there’s always the option to distance-cull your spawned meshes, decals, particles, etc. not sure how much of that you’re already doing, but distance-cull would lighten the occlusion culling pass.

@Chosker My building system is similar to that in Fortnite (although a very different game). I wonder how they have implemented their system such that 100’s (or 1000’s) of dynamic building pieces can be rendered on screen, even on mobile hardware, with no obvious performance hit. I know it’s the new engine, but whatever techniques they’re using are probably still applicable. Same for Rust (which is a unity game).

UE4 has a lot more optimizations in that regard. UE4 has HZB occlusion which can speed up occlusion queries (UE4 has it disabled by default so I don’t know for certain if they use it, but probably do). UE4 also has some DX11 rendering optimizations to the way staticmesh drawcalls are queued into the render thread, and on top of that they have automatic instancing (since 4.22) which greatly reduces drawcalls when using modular meshes.
basically all the engine heavy lifting done in this regard exist do to significant UE4-specific and DX11-specific improvements, which sadly won’t be applicable to UDK. you can’t even spawn foliage mesh components which would be the path to take towards mesh instancing.
Unity also has automatic instancing AFAIK.

@Chosker I’ve figure out a way to utilize static mesh instancing, leveraging the instanced foliage actor.

Any meshes you need to instance need to be added to the foliage tool on the map you want to instance them on. And at least one needs to be placed somewhere in the world (so somewhere hidden on the map).

Although you can’t seem to add new InstancedStaticMeshComponents to the InstancedFoliageActor.InstancedStaticMeshComponents array (or at least and then have them actually show during gameplay), you can add/remove instances to existing components.

I have it working pretty well. For each of my building components, once they are spawned, I call my custom instancer class to render that building component’s mesh as an instanced mesh, then SetHidden on it’s own static mesh component.

I have custom damage skins for my building pieces, so if they take damage they simply ask the instancer class to remove their instanced mesh from the instanced component, then unhide it’s own static mesh component. This way they can still have their own individual damage skins (but just won’t use instancing while they’re damaged).

I am seeing a massive boost to performance in early testing. I’m kinda amazed that I’ve not seen any other cases of people doing this, as adding instanced meshes via script during gameplay is a very powerful tool for many games. Even if it is a bit of a hack using the foliage tool.

@Coldscooter welp, that makes things interesting.
I looked around and couldn’t find any info referring to what you’re doing. the closest is CobaltUDK hiding/unhiding pre-placed foliage clusters.
curious as I am, I tried what you’re suggesting. I pre-placed a foliage mesh in the level, then through code find the level’s InstanceFoliageActor, get the InstancedStaticMeshComponents[0] (which prints its StaticMesh reference correctly), and insert a new InstancedStaticMeshInstanceData element into the PerInstanceSMData array with its Transform set to a constructed MakeRotationTranslationMatrix().
[edit] ok I got it to work. I just needed to call ForceUpdateComponents() after adding into the PerInstanceSMData array. whoa.

but foliage doesn’t seem to collide with rigid bodies. any luck with that?

glad to hear it’s giving good results to you. in UE4 it’s easier than ever. however on UE3 only licensees are likely to have used instanced meshes in the past through native code. for me this on the realm of pure UDK users is totally unheard of. you might have stumbled into a true hidden gem here :slight_smile:

@Chosker I’m still attaching the static mesh component in the building pieces, and using that for my collision, but now I set that mesh to hidden and render an instanced version of the mesh.

I’m now using instanced meshes for the majority of my building pieces and the performance boost is pretty crazy. My CPU draw calls are a tiny fraction of what they were.

The only annoying thing is having to manually add each of the meshes to the foliage tool and place them in a hidden spot on the map (rather than being able to do it all from script), but it’s a small price for the extra performance I’m getting.

https://content.screencast.com/users/coldscooter/folders/Jing/media/d88ee0f7-6c7b-49ee-b376-5e4da75da65b/2020-02-20_1516.png

I agree i may have stumbled into a bit of a gem here. Now I’m starting to wonder how else i could utilise this. I’m thinking of using it for rendering distant speedtree billboards, as they are also pretty heavy on draw calls.

Edit: Instead of using ForceUpdateComponents() on the foliage actor. I’m just calling


instSM.SetHidden(true);instSM.SetHidden(false);

on the InstancedStaticMeshComponent’s individually. I have a lot of components in my foliage actor, so this seems a little less heavy.

well foliage meshes can have collision. in my test the regular collision (zero and non-zero extent) works fine. if there’s a way to make it work with rigidbody collision you’d be able to save up on the actor+component count which would probably be more beneficial for you since you can potentially have so many spawned actors. I’ll try digging into the code a bit to see if I can make it work with RB collision (but first I have to compare the performance to see how much this impacts my game and if it’s worth it)

yeah having this hidden area with pre-placed foliage meshes can be a bit annoying. in my case the game has 9 levels so if I move forward with this I’d need to add almost all of each mesh 9 times (I really should try a shared streaming level). it’s not the first time I need a hidden placeholder area for some purpose though. at work we used to call such area the “parking lot” :smiley:

gave this some more testing. I didn’t even add all mesh types as foliage and already sadly for me the results are counter-productive leading to worse performance.

my game’s levels are made up of small to medium sized rooms and connecting corridors. there’s also a large number of pointlights placed throughout the entire level (all with dynamic shadows). I make heavy use of distance culling for far rooms/corridors and then occlusion culling is efficient enough for the remainder close rooms/corridors near the player, and I also cull out lights over distance.
by using instancing I’m losing all that optimization which leads to the entire level’s geometry being rendered, once for the player camera and once per active light.

I also found that only the editor is capable to reallocate foliage clusters. so this effectively leaves you with one giant cluster of meshes [per unique mesh] which loses some of the efficiency of UE3’s instancing.
since cull distance in UE3 is done per cluster there’s no breaking down of the instancing to only render what’s needed.

those are only my observations on what might be going on but somehow things get worse on many levels. somehow with instancing I have almost 3x the amount of drawcalls (wtf) and almost 2x the amount of triangles drawn (expected). traces seem to take longer to process which waste around 2.5ms more on my game code processing. shadowing takes slighly more time as well apparently because it determines that more shadows need to be rendered (probably due to “bigger meshes” being inside the bounds of more lights). culling seems the only thing that’s faster but the gains are almost insignificant.
I’ll try adding the rest of the meshes to my foliage pool but I doubt it’ll turn things around.

now your game looks like all outdoors with a sun light casting dynamic shadows, which is likely re-rendering all geometry in the shadow pass anyway. and unless you do some agressive distance-culling (with a fog covering the view on the distance) I can see how instancing can give you a significant boost.

I also peeked at the native code and found that UInstancedStaticMeshComponent::InitComponentRBPhys() has a TODO comment and does nothing :rolleyes:
so sadly if you want rigidbody physics with instanced foliage you need to keep spawning collisions separately

If the problem are the modular pieces, perhaps the problem is the lack of maxdrawdistance and setting no collision in the lod3. And, do you have active dynamic point lights in the buildings? That’s cost a lot of performance, I haven’t resolved yet but you can mitigate it by disabling shadows in the distance.

Also I see you have a lot of skel meshes drawcalls. A way to reduce is to hide (and untick) actors by distance.

I remember you use foliage tool for grass. I think this is too much for a big map. My solution was use only for trees (with the hide/unhide by distance, fundamental), and for the grass using a dynamic system.

I did a stat engine in my game. The bottleneck here is the GPU, but I hope to improve a bit because the most of static meshes here has no LODs yet (only the trees). Most of small houses are placed using the foliage tool. With them I have a problem with the RB collisions, rag dolls do a trace in the velocity direction to detect foliage tool instances, and then stop the movement if collision. Works bad for now.
My hardware is 4790K and GTX970.

@Chosker I also considered the fact that you can’t add additional clusters. One way I feel you can deal with this is copy the platform you’ve placed your “parking lot” meshes on, then paste that as many times as you like (and set instances per cluster to 1), then as you’re creating new instances, batch them together into instances bases on their location. So maybe for a massive map like mine, a 4-by-4 grid (so 16 instances per mesh would probably be enough). I haven’t needed to do this yet, as performance seems fine, but it would be a way to cull far away instances or use their low poly LOD’s if you’re concerned by rendering too much complex geometry.

Based on how you’re describing your game and levels, i imagine this whole instancing trick won’t give you much benefit. It’s really more for if you’re rendering 100’s or 1000’s of the same mesh on screen at a time. So in the case of my game, I can’t prevent players from building whatever they feel like, so I have to try to handle the worst case scenarios of them building huge cities on the map.

@CobaltUDK I’ve done a lot of testing with collision, and even when I have over 10000 dynamic actors in my world with collision, when i remove the collision I don’t see much performance gain.

The actors themselves though, even when are hidden (and using no draw calls), seem to use up performance. I’m still not sure why this is. Any ideas?

Update: Regarding “Skel Mesh Draw Calls”, it seems if I add one actor with a single skeletal mesh component to the world (in my view) it increases the Skel Mesh Draw Calls by around 30. I really don’t understand this number…???

In addition to this, the draw calls don’t respect the MaxDrawDistance culling. Or at least not exactly. When the skel mesh first gets culled when moving away from it, it is still making draw calls. Moving much further away will stop the draw calls. If I call SetHidden() on the component then it immediately stops the draw calls. Do you know why draw calls are still happening when the component has been culled? I’d hate to have to add my own culling logic to hide the component.

This is out of my league, but doesn’t the procedural building feature have similar capabilities, rendering many instances of the same mesh and reducing the draw calls?