Seriously can't figure fps drop origin

Hello, we have a serious problem on our game frame rate we cannot resolve.

Our game is an open world, we are using landscape grass to spawn grass and other little foliage like flowers. Grass and flowers doesn’t have shadows, except for some higher (in units/dimensions) istances.
Then we have other foliage spawned with classic foliage tool (bushes, little trees, mushrooms etc.)
Trees are spawned with procedural foliage tool.
Little shapes doesn’t have shadows, we are using distance field raytraced shadows to render shadows that are at 100m about from our character, so the cascades shadows are just in a small area about the character.
Distance field raytraced shadows are only on high trees, foliage is not affecting distance field.
Occlusion culling is disabled in project settings, activating it takes 2 fps.
HZB is disabled.

Here are some screenshots to show you our main problem

  1. Looking at center of the screen.

    38 fps.
    I am on a nVidia 970 GTX at the moment, fullscreen in standalone has the same fps. Nothing change
    Let’s take a look to the gpu profile

This is the base pass, which is the only thing is taking seriously time (11.76 ms) on our gpu. ->

The main two process which are taking ms are StaticOpaqueNoLightmap and Dynamic.
Do I have to suppose that shadows are costing a lot? Ok, so I tried to look at shadow cost:

This on the top is the result of stat ShadowRendering command. It seems like shadows are not costing on my scene, so I supposed that maybe the number of assets could be the problem and I checked out with “stat initViews” command.
These are the results.

Now, I have 373 primitives, is it too much? What is frustum culled primitives?

Then i took a look at sceneRendering

e5a57b95b1872e719067af5d5afd423f13a4b33e.jpeg


What’s going on? Do you have some suggests on how can I improve my fps or figure what is taking so long to render? Thank you very much.

Nobody knows something?

Its realy no secretthat ue4 is bad optimized in looks on performance.
Updating your rig is indeed your best option. I have the same troubles.
No level streaming wont fix it.

This will not solve bad performance on User’s pc…
I could develop it on a 5000 $ pc, but on a middle user’s machine it will always have this bad performance.

Yee but you cant do anyting else :smiley: EPIC is working on that issue and thats good :3

Have you checked your CPU side performance, too? Just to exclude a few more possibilities. (stat units). Also, how many vertices are rendered in that scene? If the base pass is taking that long, it can be the amount of vertices, or shader complexity (normal setups for example).

CPU is not affecting FPS at all… Number of vertices is really high but not so much-
Dropping for this limited number of instances really makes me think that ue4 is not made for open worlds.

The engine has a long way to go.
If you recreate the same scene with exact same assets in cryengine you’d probably still have something around or above 120 fps even with dynamic lighting and really far shadow casting range.
I used to put so much pressure on me trying to figure out where all the performance is being spent but I come to realize that we just have to give it years until the engine is fully optimized because I’m not a programmer to root the problems and fix them after all.

For example, I just took a look at EaaS last night and was blown away when I saw POM uses 256 steps by default per each layer that has POM enabled. Not sure why my PC should explode if I have 256 steps on a single POM layer in UE4. The only thing I can think of is cryengine has been around much longer and so I believe UE4 will get there over time as well.

Well, it certainly does. If that is true in your case may be a different story though.

Well, how high is it exactly?

And guys, stop blaming some piece of tech without knowing all the details. 95% of the cases its the content creator itself that caused the slowdown. Be it by placing the vegetation “the wrong way” - separately instead of painting it, be it that LODs are missing, or that the scene has some 6 million polys - even with today’s hardware and depending on the lighting tech used, you can still kill your scene quickly.

I have a comparable scene set up, with a bit less variety - it does run at above 80fps though (>100 without grass). So it is possible, you need to know your tools though.

EchelonV, Have you tried doing a test scene in 2/3 different engines with the exact same setup?
I assume not, because you’d have noticed if you’re getting 80 fps here, in other engine you might be getting 120. For various reasons that I can’t pin point as I’m not a programmer.

The answer is in the first few words of your question:

Landscape grass is HIGHLY expensive: you should expect a halving of your performance just for using it. It proceduraly generates a random distribution for your meshes all over the world. This feature was used in the Kite Demo, but it’s NOT a mid-spec or even a high-spec feature: it’s just insane, and very poorly optimized. I tried using it, and it slowed my 60 FPS project down to 25. The number of meshes didn’t seem to matter so much as there was always a HUGE overhead with that feature. It’s demo-ready, not game-ready.

The easier way to render foliage meshes is to use the Foliage tool. Knock down the lightmap resolution to something sensible, like 4 or 8, and make sure your lightmap follows best practices. Use dynamic cascaded shadow maps from the dominant directional light and ambient occlusion to provide a nice dynamic shadow and shade to the foliage. Lightmass seems to work really well with foliage right now, so definitely take advantage of it. You can use the Landscape layer to limit the foliage settings, and with one click you can paint-bucket all of the foliage on that layer. Of course, there are a few other tools to help you paint on foliage if you need a specific look, including the ability to manipulate individual instances. Otherwise, paint-bucket that layer and you’re good to go!

If you used foliage in the past, the new foliage tool (as of 4.7/4.8) is really impressive: each mesh handles its own culling and LOD, yet it actually runs significantly faster than the old clustered method.

Echelon, obviously I am not saying that the engine is bad and that this fps drop isn’t my fault. I started saying “where am I wrong” not with “this engine sucks”.
I described all the processes I made, I think that I followed every single step that Epic Games suggest to get nice results and high fps.
Every assets has 2/3 LOD on it, and I can assure you that my scene is really poor compared to an Ark scene which runs at 30 fps on my pc (GTX 970), with a really massive number of instances.

I will try grass with foliage, didn’t know that Foliage Grass was an high end feature.

Ok, I tried to spawn foliage just with foliage tool.
I gained about 15 fps.

Now I am looking at my profiler, and I see this voice

95bd0823b112cee33bd55e62fa0e0e80d8e67c11.jpeg

“RenderQuery Result”
and
“RenderView Family”

Are getting most of ms in my profiler.

Is it about instance number?
How can I have more details?

If your highest cost is in the base pass then I suggest you look at the material complexity; in particular masked materials and overdraw will be big contributors there. Viewing shader complexity mode is a good starting point.

>>For example, I just took a look at EaaS last night and was blown away when I saw POM uses 256 steps by default per each layer that has POM enabled. Not sure why my PC should explode if I have 256 steps on a single POM layer in UE4.

Prior to 4.11, UE4 was compiling shaders with the custom node inefficiently. It was actually recalculating the POM result for every single gbuffer input. So a standard shader that used basecolor, roughness, spec, normal and possibly other textures was performing the POM for each input. This has since been fixed. You should be able to see if your version has this optimization by seeing how much the instruction count climbs when compiling a POM material that has only 1 texture hooked up (ie basecolor) and then hook up that same texture to several inputs. With the optimization the instruction count should stay pretty much the same. Without it, the instruction count may double or more after a few hookups.

Occlusion culling is where it tries to figure out what foliage should still be rendered and what not. So as RyanB stated, you’ll have to look into your foliage shader and opacity. If its masked, one step would be to reduce the amount of grass instances rendering past each other. If you used translucency without a mask, try to reduce the amount of “space” on your grass mesh that is not actually covered by grass texture, but empty, as that probably still costs time to calculate that the empty parts are “see through” and stuff behind it shouldn’t be culled. (If I am not mistaken :slight_smile: )

Offtopic, but yes, I actually have - same level set up even, both in UE4 and CE. UE4 won, which is why I am here now and probably the reason why you post here now, too - so I left CE out of the discussion as the OP wanted to get down to the performance drop in his UE4 level. Let alone, the stock foliage in CE, that most people use to show their “levels” is highly optimized, along with their foliage system which has been updated over the years. So yes - that aspect of the engine is probably faster than the counter part in UE4. Wasn’t enough to keep me there though.

Disable tessalation
Don’t use 4k textures, 1k enough or 512 for small meshes
Add a cull distance volume to your entire scene
Disable collision for grass stuff
disable landscape shadows

In front of the view is a village with 700 meshes, landscape is 4000x4000 and has 200 meshes so far

With the player in the village its around 40-50 fps, depending on lightning settings (dynamical lightning eats a lot)

Issues i encountered, when i change lightning settings, sometimes i have to restart the editor to get what appears to be the real fps.

Using 4.10.4

Inside the village +70 fps, you can see aggressive culling, the wall in the distance has a missing part, later will merge all walls, then large structures stay in place. Did not test to merge trees, maybe this has some impact too, in case wind motion would stay realistically.