Understanding performance bottlenecks?

I’ve tried adding more of the level meshes as foliage and the more I add the worse it gets. I’m still at a loss as to why I get so many more drawcalls though.
I suppose I could make multiple parking lots and match them by location but I’d need a very elaborate setup to make the clusters match my rooms and corridors structure to still get the most of occlusion+distance culling. in the best case it would merge all modular pieces of the same type of a given room together, and with multiple lights per room I don’t think the benefits would be substantial enough.
indeed it seems a solution much more suited to your type of game

as for the actors that still use performance, do check for their tick costs with what I posted a few posts above. disable the tick, set it to not movable, and do a profilegame

, the fact that you’re getting more draw calls sounds like something isn’t setup correctly. Are you sure you’re hiding the original components that the instance meshes are replacing?

Today I’ve moved even more of my building components over to use instances and my draw calls are way down (and fps way up on maps with lots of big buildings).

https://content.screencast.com/users/coldscooter/folders/Jing/media/fa4491ed-1fdc-4cee-b3a0-36311989645d/2020-02-22_1710.png

Earlier I was rendering over 10,000 individual building pieces in a scene with the frame-rate not seeming to flinch.

I’m planning on switching all of my speedtree billboards over to use the same system.

What I noticed time ago was that hidding actors using maxdrawdistance is not the same that perform a setHidden on them.

For example the foliage tool has it’s own maxdrawdistance option, but even hidden using that, they are casting shadows. Why? I don’t know. Even the hide clusters was in the other side of the map, outside the sun cascade shadow radius, they was wasting resources heavily.

For that reason I did a function to apply a SetHidden(true) for the far clusters and the difference in performance is big.

For skeletalmeshes I do the same. Maxdrawdistance on meshes has the same problem. So I send the far pawns to a “sleeping” state in the controller, and here I do a sethidden on the pawn.mesh, and pawn.setcollision(false), and pawn.settickisdisabled(true). That improves a lot, but still takes a 0.1 ms for each 100 sleeping pawns (and I have about 1000 in the map). Controllers seems to no have significant impact in the performance in this state.

Also I did a function to store the pawn properties (a struct) in an array, then destroy the far ones, and spawn again when they are near. In each tick some array entries are checked to spawn the near ones or hide the far ones. It’s faster than using the sleeping state, but I don’t know yet if the constant destroy/spawn it’s a good idea. With the grass has some crashing problems, so I will need to reuse the meshes instead destroy/spawn.

Other ideas are to check regulary all the dynamics actors to set tick or untick by distance, and the sethidden. And the stream levels.

I’m not even spawning the original components at all. at this point my “foliage spawn code” is hooked directly in (and only in) my random dungeon spawner code, which spawns prefabs (by spawning each actor that compose each prefab). it checks if the prefab’s actor-to-be-spawned has a mesh that’s in the foliage list and if so it skips spawning it and simply adds it to the foliage PerInstanceSMData. atm this means I still don’t get RB collisions, but the foliage spawning code works because if I use ‘show instancedstaticmeshes’ it only hides the foliage-meshes and there’s no other meshes in their place. before I call it a dead end I’ll investigate more about my increased drawcalls because something definitely feels off and at least I want to have a real comparison.

I also tried adding multiple parking lots with maximum cluster size (a grid of 6x6 in the editor). while the editor tells me I have 36 clusters, when I add the meshes through the code they seem to all go into the same cluster because despite being far off in the level they still all get culled together (while I can see the parking lot meshes get culled independently). As far as I could see there’s no way to indicate which cluster a mesh gets assigned to. The system won’t seem to do it automatically based on the location, and the only way to update clusters is through native code which only gets called from the editor. Or did you maybe find a way to assign meshes to specific clusters?

yes the foliage seems to have a bug (also in UE4) where hiding it still makes it cast shadows. yours seems like a good solution to get some performance back :slight_smile:

I have been under the impression that a cluster is simply an InstancedStaticMeshComponent in the InstancedStaticMeshComponents array of the foliage actor. So it shouldn’t matter where on the map the parking-lots are placed (it should only matter how you add the SMdata structs, which should be based on your own logic of what location-based grid position they belong in). Are you saying that if you add the same mesh to two different InstancedStaticMeshComponent’s, they cull at the same time? That would seem very weird.

I got around this by simply having the LOD1 on foliage meshes have no shadows. I have several million foliage meshes on my map and setting hidden on all of them gives very little noticeable performance gain.

One question I have: I know how to get and set Location and Rotation on the Transform matrix of an InstancedStaticMeshInstanceData struct. But how do I get and set the size?

My trees haven’t shadows (and collision) in LOD3, and with only with the foliage tool maxdrawdistance the performance is bad. Hiddind the far clusters improves a lot.

Here you can see the difference, from 36 fps to 76.

@CobaltUDK Interesting… How are you checking the location of a cluster, as each clusters translation is always vect(0,0,0)?

ah, I didn’t realize that different clusters were equivalent to different entries in the InstancedStaticMeshComponents array.

[FONT=Helvetica,Arial,Verdana,sans-serif]I made an updated test with a parking lot grid and some logic to use clusters based on my room/corridor structure. I’m only using instancing for a few meshes but at this point I can see that my initial assumption was more or less correct: while I have less draw calls I still have a slightly higher overall performance cost due to more costly dynamic shadows.
so here ends my adventure into UDK foliage hacking

some of the older threads suggest modifying the ZPlane component of the transform matrix to alter the scale but I don’t know exactly how that works (might be that the ZPlane.XYZ directly corresponds to the XYZ scale?)

With MatrixGetOrigin(component.PerInstanceSMData[n].Transform)

I did a simple script to show the proccess here: https://forums.unrealengine.com/legacy-tools-unreal-engine-3-udk/udk-programming-and-unrealscript/1344053-improving-the-foliage-tool-performance

also, can be useful the property “InstanceEndCullDistance” in the InstancedStaticMeshComponent to know when you can start to hidding the cluster.

Finally I made a system to check all the clusters in a distributed way along several ticks, taking into account the player speed. It works quite well, it’s one of the few things I’m happy with.

I was able to make that script thanks to a code I saw that placed collision boxes on each mesh, to make the RB collisions work.

It was terribly slow in a large map like the mine (and probably the yours too), but I was able to see how the process of reading the position of each cluster worked. I don’t have the script, but if you find it there you can see how the location and rotation of each mesh is found.

@CobaltUDK Thanks for the example. This is very similar to what I’ve just added. I’m not bothering to iterate through each of the PerInstanceData array though, instead just checking a single one in the mid point of the array. It’s a rough approximation, but generally if your cluster size isn’t too huge, it will be fine.

In your example you check all of the PerInstanceData up until to find one that is outside of the range, then hide the cluster. Seems unnecessary to check any more than one.

Also @ regarding my previous post on how to get the scale from the transform matrix. Here’s what I’m now using:



//-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
//Gets the scale from a matrix transform.
//-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
final function vector MatrixGetScale(Matrix TM)
{
    local Vector s;
    s.x = sqrt(TM.XPlane.X**2 + TM.XPlane.Y**2 + TM.XPlane.Z**2);
    s.y = sqrt(TM.YPlane.X**2 + TM.YPlane.Y**2 + TM.YPlane.Z**2);
    s.z = sqrt(TM.ZPlane.X**2 + TM.ZPlane.Y**2 + TM.ZPlane.Z**2);
    return s;
}

//-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
//Sets the scale on a matrix transform.
//-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
final function Matrix MatrixSetScale(Matrix transform, Float _scale)
{
    transform.XPlane.X *= _scale;
    transform.XPlane.Y *= _scale;
    transform.XPlane.Z *= _scale;

    transform.YPlane.X *= _scale;
    transform.YPlane.Y *= _scale;
    transform.YPlane.Z *= _scale;

    transform.ZPlane.X *= _scale;
    transform.ZPlane.Y *= _scale;
    transform.ZPlane.Z *= _scale;
    return transform;
}


I’ve now added a system to swap out all of my speed trees billboards (beyond a given radius from the player) with instanced meshes using the same system I used for the modular building pieces.

Now an example save game (i was using for testing), which was running at ~25fps last week, is now running at around 140fps! The performance increase has been insane using the foliage tool for mesh instancing.

I’m now going to take it even further with the foliage culling as @CobaltUDK has suggested, and see what kinds of results that brings :slight_smile:

It’s a big improvement. I think hiding the distant clusters will improve it even more.

I was testing procedural buildings the last week. The problem for me is that it’s a volume, which works well for buildings in the background, but not for buildings where the pawns move around.

Anyway using them for large buildings can be counterproductive, because it reduces drawcalls, but the ones that are there always draw the whole building, that is, they make the occlusion worse.

I made 3 basic pieces, which don’t slow down too much. And I’m thinking of making some more pieces that group 2 or 3 of these, to reduce the drawcalls a little but without making the occlusion much worse.

Thought I’d show a before and after when using my new system using the foliage tool to render the individual modular building pieces as instanced meshes:
Before:

After:

As you can see, this experiment has been a major success with around a 10x performance increase. In this scene there are nearly 6000 individual building pieces. It doesn’t seem to matter how many I add, the performance seems to hold steady. My players that have very large bases will be happy with this update :slight_smile:

hah, very nice. you even have better shadows :slight_smile: