No announcement yet.

Understanding performance bottlenecks?

  • Filter
  • Time
  • Show
Clear All
new posts

    Understanding performance bottlenecks?

    I'm struggling to understand where the bottlenecks are when my game is under heavy load and running slowly.

    On a map with thousands of player-placed structures, the frame-rate will drop. However when I run Msi Afterburner and monitor my system resources, it doesn't show any of the hardware under any particularly heavy load:

    This was captured on a very crowded map, looking out over a large distance with a lot of visible meshes.

    However the GPU is only running at 30% and the CPU threads seem to have a lot of head-room. But as you can see the game is running at 35fps (which would be >100fps if I deleted all the player-placed structures on the map).

    So what is causing the slowdown in the engine? Any ideas on how to better inspect where the bottlenecks might be?

    Are those 10 gb ram what is used currently in the game or is what the machine has in total?(how much ram do you have)
    Does stat engine show anything strange?

    A little out of topic, but it would be cool if we can spawn or use dynamic cull distance volumes, but the default udk one is static.Tried it before and failed.
    Last edited by O_and_N; 02-14-2020, 09:35 PM.


      O_and_N I've tried running "stat engine", but the results don't show anything obvious:

      This is when the game is running slow at 30fps (with lots of meshes placed on the map):

      Then this is with all o the player placed meshes removed (and then the game is running fine at 60+fps):

      I just figure out exactly where the performance is being eaten up, and why my CPU and GPU don't seem to be under much stress when the game is running slow.

      Here is a side-by-side comparison of the hardware stats when the game is running slow and when it's running normal:

      As you can see there is basically no difference in terms of stress on the hardware between the two, except one is running at half the speed.

      So the performance bottleneck is happening somewhere in the engine.

      Any ideas?


        I havent played your game(have been meaning to and will).
        By ''player placed meshes'' are we talking about items, buildings? Are they skeletal meshes or interpactors?

        I remember once that i made a building composed of 100 interpactors(stupid of me) and was animated to lift up from the ground as a elevator.Performance was killed as udk coudnt handle so many interpactors animated at once.Than i exported the building to maya>combined everything>and ended up with a building of 5 big chunks and the performance was fixed.Not sure if this relates here but its good to share the experience just to remove one thing from the problem.

        So that stunt was a engine limitation.
        Why is the Skel mesh tris so much higher in the scene with removed meshes vs the populated scene,different camera angle looking at other things?


          O_and_N Yes they are building pieces (walls, ceilings, foundations, etc). The are essentially Actors with a static mesh component.

          You can see an example of the types of modular structures you can build in this video i posted last night:


            UE3 is not fully multithreaded (it uses some threads but not all. no game engine really is AFAIK), and unrealscript is single-threaded, so don't rely on your cpu threads for guidance.

            start by running 'stat unit' and see if your game is bottlenecked by the cpu (Draw means cpu rendering thread[s], Game means the cpu game thread[s]) or the gpu (simply GPU)
            if the gpu is the main problem then run a 'profilegpu' and check the log, you'll see the timings of your gpu processes on that frame
            if it's the cpu then check around all the other stats like stat scenerendering, stat octree, stat physics and so on (there's many of them, you can easily find and toggle them if you run the game with the remotecontrol) and then obviously stat game. if the cpu is the bottleneck you'll need to understand if it's your game code (stat game) or some other engine cpu thing like physics, occlusion, particle processing, etc.
            if you find that your game code is a major blocker you can dive deeper using the gameplay profiler (look in the docs for that, basically 'profilegame 3' will capture 3 seconds of gameplay into a profile, then open it with the gameplay profiler tool), and check specifically the code functions that might cause problems (and even get to optimize some things you didn't know could be slowing down your game a bit)

            given that your game is slowed down by player structures my guess is your bottleneck will be on the rendering thread, but no better way to know until you find out for sure
            Follow me on Twitter!
            Developer of Elium - Prison Escape
            Local Image-Based Lighting for UE4


              Thanks for the input Chosker. I've run some further comparisons between the game world with lots of player-built structures and one with none:

              With "stat unit", while all the stats a dropped, the most significant deficit appears to be the "draw" (cpu rendering thread).

              stat scenerendering:

              stat octree:

              stat game:

              So i'm still analysing the stats, but it does appear to be the number of draw calls on the CPU, coupled with the number of shadow and lighting drawing.

              There also seems to be a drastic increase in "Dynamic path draw calls" in the scene rendering stats. What is this?

              Posting these stats in the hope someone may be able to shed more light (no pun intended) on them, and perhaps suggest some ideas of how to optimise.

              Last edited by Coldscooter; 02-18-2020, 02:15 PM.


                Today I tested Voice acting for my game for 100 AI and it took a lot of resources The Sound Cue System.
                Some cool unreal projects you may like!

                Developer of GOTA - Survivalism Gladiatorial game Buy the game google gladiators of the arena steam

                Download the demo Download the demo
                external link Dropbox DropBox

                Become a c++ Professional now C++ Networking Tutorial


                  seems you have a little bit of different things adding up. simply put, Unreal isn't super efficient at rendering large amounts of dynamic actors (which seems reasonable).
                  some of the obvious solutions would be simplifying what the player can build. more granularity means more freedom for players but also means more rendered meshes. I don't know if your granularity is at walls/doors/etc level, entire rooms, or even entire houses/structures. but in general reducing the granularity would reduce the problem.
                  in my game with spawned dungeons I had fully modular level assets, but for some very recurrent cases I had versions of those meshes attached at the art pipeline level. for example I would replace 4 tiles of 1x1 floors with a pre-merged 4x4 floor. in your case being user-built it's much more tricky but it might give you some ideas.

                  the dynamic rendering path (movable actors) has a higher cost than the static rendering path (static actors). spawning dynamicSMActor static meshes makes them being drawn on the dynamic rendering path.
                  in my spawned dungeons I had a StaticMeshActorSpawnable class child of DynamicSMActor_Spawnable, and in the defaultproperties I had
                  give it a try, it might help making your spawned meshes a bit lighter in a few categories.

                  you also have a lot more shadow and lighting drawing. probably normal for shadows if your objects cast dynamic shadows, but the lighting part seems strange. if you have dynamic lights with shadows spawned on your player structures you probably want to disable them by distance.

                  lastly there's always the option to distance-cull your spawned meshes, decals, particles, etc. not sure how much of that you're already doing, but distance-cull would lighten the occlusion culling pass.
                  Follow me on Twitter!
                  Developer of Elium - Prison Escape
                  Local Image-Based Lighting for UE4


                    Chosker My building system is similar to that in Fortnite (although a very different game). I wonder how they have implemented their system such that 100's (or 1000's) of dynamic building pieces can be rendered on screen, even on mobile hardware, with no obvious performance hit. I know it's the new engine, but whatever techniques they're using are probably still applicable. Same for Rust (which is a unity game).


                      UE4 has a lot more optimizations in that regard. UE4 has HZB occlusion which can speed up occlusion queries (UE4 has it disabled by default so I don't know for certain if they use it, but probably do). UE4 also has some DX11 rendering optimizations to the way staticmesh drawcalls are queued into the render thread, and on top of that they have automatic instancing (since 4.22) which greatly reduces drawcalls when using modular meshes.
                      basically all the engine heavy lifting done in this regard exist do to significant UE4-specific and DX11-specific improvements, which sadly won't be applicable to UDK. you can't even spawn foliage mesh components which would be the path to take towards mesh instancing.
                      Unity also has automatic instancing AFAIK.
                      Follow me on Twitter!
                      Developer of Elium - Prison Escape
                      Local Image-Based Lighting for UE4


                        Chosker I've figure out a way to utilize static mesh instancing, leveraging the instanced foliage actor.

                        Any meshes you need to instance need to be added to the foliage tool on the map you want to instance them on. And at least one needs to be placed somewhere in the world (so somewhere hidden on the map).

                        Although you can't seem to add new InstancedStaticMeshComponents to the InstancedFoliageActor.InstancedStaticMeshComponents array (or at least and then have them actually show during gameplay), you can add/remove instances to existing components.

                        I have it working pretty well. For each of my building components, once they are spawned, I call my custom instancer class to render that building component's mesh as an instanced mesh, then SetHidden on it's own static mesh component.

                        I have custom damage skins for my building pieces, so if they take damage they simply ask the instancer class to remove their instanced mesh from the instanced component, then unhide it's own static mesh component. This way they can still have their own individual damage skins (but just won't use instancing while they're damaged).

                        I am seeing a massive boost to performance in early testing. I'm kinda amazed that I've not seen any other cases of people doing this, as adding instanced meshes via script during gameplay is a very powerful tool for many games. Even if it is a bit of a hack using the foliage tool.
                        Last edited by Coldscooter; 02-20-2020, 02:58 AM.


                          Coldscooter welp, that makes things interesting.
                          I looked around and couldn't find any info referring to what you're doing. the closest is CobaltUDK hiding/unhiding pre-placed foliage clusters.
                          curious as I am, I tried what you're suggesting. I pre-placed a foliage mesh in the level, then through code find the level's InstanceFoliageActor, get the InstancedStaticMeshComponents[0] (which prints its StaticMesh reference correctly), and insert a new InstancedStaticMeshInstanceData element into the PerInstanceSMData array with its Transform set to a constructed MakeRotationTranslationMatrix().
                          [edit] ok I got it to work. I just needed to call ForceUpdateComponents() after adding into the PerInstanceSMData array. whoa.

                          but foliage doesn't seem to collide with rigid bodies. any luck with that?

                          glad to hear it's giving good results to you. in UE4 it's easier than ever. however on UE3 only licensees are likely to have used instanced meshes in the past through native code. for me this on the realm of pure UDK users is totally unheard of. you might have stumbled into a true hidden gem here
                          Last edited by Chosker; 02-20-2020, 06:06 PM.
                          Follow me on Twitter!
                          Developer of Elium - Prison Escape
                          Local Image-Based Lighting for UE4


                            Chosker I'm still attaching the static mesh component in the building pieces, and using that for my collision, but now I set that mesh to hidden and render an instanced version of the mesh.

                            I'm now using instanced meshes for the majority of my building pieces and the performance boost is pretty crazy. My CPU draw calls are a tiny fraction of what they were.

                            The only annoying thing is having to manually add each of the meshes to the foliage tool and place them in a hidden spot on the map (rather than being able to do it all from script), but it's a small price for the extra performance I'm getting.

                            I agree i may have stumbled into a bit of a gem here. Now I'm starting to wonder how else i could utilise this. I'm thinking of using it for rendering distant speedtree billboards, as they are also pretty heavy on draw calls.

                            Edit: Instead of using ForceUpdateComponents() on the foliage actor. I'm just calling
                            on the InstancedStaticMeshComponent's individually. I have a lot of components in my foliage actor, so this seems a little less heavy.
                            Last edited by Coldscooter; 02-20-2020, 07:27 PM.


                              well foliage meshes *can* have collision. in my test the regular collision (zero and non-zero extent) works fine. if there's a way to make it work with rigidbody collision you'd be able to save up on the actor+component count which would probably be more beneficial for you since you can potentially have so many spawned actors. I'll try digging into the code a bit to see if I can make it work with RB collision (but first I have to compare the performance to see how much this impacts my game and if it's worth it)

                              yeah having this hidden area with pre-placed foliage meshes can be a bit annoying. in my case the game has 9 levels so if I move forward with this I'd need to add almost all of each mesh 9 times (I really should try a shared streaming level). it's not the first time I need a hidden placeholder area for some purpose though. at work we used to call such area the "parking lot"

                              Follow me on Twitter!
                              Developer of Elium - Prison Escape
                              Local Image-Based Lighting for UE4