Memory and GPU Optimization

Hello,

I have a few questions regarding how Unreal engine uses and optimizes memory and GPU use. Maybe they are simple questions or maybe it is a case by case situation but I have little idea of how to test them:

  1. Consider a simple mesh + material (textures & shader instructions) copied hundreds of times in a level (exact copies). Because it is only one mesh and only one material loading from disk is supposed to be fast even for multiple object copies (doesn’t matter really I think) BUT is there any optimization inside the engine regarding memory usage for duplicated objects OR 100x copies of an object uses 100x the RAM and GPU? Surely it is not 1x the RAM but maybe some middle ground?

  2. How efficient (performance-wise) it is to use material instances (materials with different parameters)? I understand using the same base resources (textures) is good to reduce overall disk size (and loading times) but when different instances of materials are applied to the objects they are effectively using multiple times the memory, right? Or is there some optimization with this?

  3. How important it is to optimize the objects that are not shown on screen? For example: it is easier for me to build a mountain, cave or building using modular parts even if I just see part of them from the player’s perspective, that is, some parts are never shown inside the game. These “useless” geometries impact my performance? Also: consider a long street with several objects with decent textures visible simultaneously. Could I improve performance making corners and bends so that less is visible at any given time?

Thank you!

Totally offtopic, but are you by any chance my old teacher from GLR?

Considering I don’t know what GLR is, it is very unlikely… :smiley:

In regards to Mesh optimization.

Yes it is loaded once, and the vertices are stored on the card in something like a VertexBufferObject (VBO).

When you render a mesh, the transform is sent to the card, and the mesh is rasterized using the verts (that are already on the card). This is the Draw Call.

Hardware instancing lets you even store the transforms on the card, thus eliminating DrawCalls.

In regards to visible geometry, frustum culling only renders meshes that are determined to be in the view frustum.
Occlusion culling further eliminates meshes that are hidden from view by other meshes.

So if your camera always faces forward, it will never render the stuff behind it.

As far as material instances go–you save memory by reusing textures, but I think a material instance has a smaller performance impact than if you made a copy of a material. The main benefit though is being able to change a parameter in the material for an object without changing it on every object that uses that material. For materials it helps to reduce the number of materials used, so for example if I’m using solid colors in a material without any special textures it can help to take all those colors and make a swatch texture so that you can use that single material for many objects and you just UV map them in the texture to the appropriate color. You can keep the texture map small so it has little impact.

Mat-instance wise.

Probably not 100% accurate but it seems to hold op quite well. TLDR version at 26:00

Also gonna attach this about drawcalls:

1 Like

My professor told me there is no need to merge 96 fence objects, because if they have the same material they will get batched anyway. I told him he is a doofus and that you still got 96 drawcalls inside the batch instead of 1. Should I be scared of my future?

Also,
2) From my understanding there is no runtime performance boost when using Material Instances. Shader Code still has to run separately. I imagine 2 meshes with different vertices and a Shader that uses the vertices to offset the mesh. What exactly do you want to share in that Shader Code to optimize it? That are my thoughts.
3) Usually you should always optimize what is on your screen. Having some meshes loaded shouldn’t effect you that much. I can assume if you are using culling, obviously these objects need to be checked if they are on screen. These checks are based on your object bounds, so it’s pretty fast. But you are right, many level designers consider corners and bends into their level design to reduce complexity. I also see many mobile games that bend their planes, so that you don’t see the spawning and rendering objects only when they are needed.

AFAIK Unity implements static batching similar to what he is referring to. It creates one huge mesh instead of several smaller ones.
Note that static batching has other drawbacks - it typically merges the objects into one big index buffer on the cpu - and cannot store the result on the card, since if you try to reuse it, you would lose culling benefits.

Because of these drawbacks, afaik UE does not implement static batching as your professor stated.

Exactly what I told him, then he said he could prove me wrong, but it would take 2 much time for him…
Also sorry if offtopic <3

I would fire the bleeper.
You can batch/instance a multitude of meshes together in ue4, but it needs some blueprint or whatever work thats out of my vfx-zone.

Nothing special here: the kind of batching we are talking about is under developer tools -> merge actors; instancing = add ISM or HISM component in your bp, just to end the offtopic <3

1 Like