Is it impossible to keep Draw Calls low?

I’m having a horrible time grasping how draw calls work in Unreal Engine. It’s all so different from Unity! Something that I could do in 10 draw calls in Unity would take over a thousand in Unreal. There you can batch different meshes with different vertex colors into the same draw call. But that’s far from true in Unreal, and I just can’t understand how it’s possible to end up with an optimized game if my draw calls are going increase a hundred fold. Are draw calls just not that much of an issue nowadays, are Unreal games so GPU bound that it never really matters? I’m just so confused.

So let’s say I wanna use Lumen, so I’ll have to avoid combining meshes together. But I’m not going to use Nanite, as it’s around 3ms slower in my use case. Which is really weird to me as a lot of people seem to imply any game should you use it regardless of target poly count. In my tests the performance is really bad, but maybe that’s a topic for another day.

If every instance of a mesh in my scene adds an additional draw call… which when multiplied by cascading shadows, additional lights or whatever else, will easily go to the thousands of draw calls. How are mobile games even viable in this case?

There are the changes that came with 4.22, but I can’t find any information on how they actually work other than the highly technical 40 minute video. What’s combined and what isn’t? What breaks this batching and what doesn’t?

My biggest issue is that I would like to rely on vertex painting, but given these draw call limitations, I’m guessing it’s completely unviable? I had the impression the average AAA game in UE was definitely using vertex painting everywhere. Am I wrong, is it more like tessellation where no one sane would ever touch it?

But even if I completely forego the idea of vertex painting, I’m still very concerned about my draw call count. Should I not? I know this sounds like worthless early optimization, but this actually impacts heavily how my asset creation workflow will be handled, as well as how I’ll atlas/pack my textures. I wish I could just turn on Nanite and not care haha.

There’s also the concept of Instanced Static Meshes (hierarchical…?), and I don’t really understand how they’re different from the automatic merging 4.22 brings.

I’d highly appreciate some insight on the matter. Unreal is mindblowingly amazing so far, but oh boy is it difficult to learn. Thanks!

Here’s something about it

It’s quite out of date, but still fairly relevant.

Also, draw calls are not where one would typically look to optimize, unless you have more than, say, 1-2mil.

Instanced SMs will get the draw calls right down, so will reducing materials. But you need a lot of the same meshes for this to have any impact.

Yes it’s possible. Unreal offers quite alot of tools for this, most of which you’ve already mentioned: The engine does automatic batching, you can use ISMs, HISMs, and of course Nanite.

Additionally:

  • Packed Level Instances
  • HLODs
  • Custom primitive data

Optimization is a huge topic and not all hardware/projects have the same optimization requirements. It’s entirely possible your project is not going to have a problem with draw calls.

The automatic batching has more overhead. It’s faster than doing tons of individual drawcalls but if draw calls are a serious problem for your game then you will probably want to be using ISM/HISM.

Thanks a lot for the replies, guys!

I’m still a bit uncertain. From what I’ve gathered, ISM doesn’t support LODs and HISM doesn’t support per instance culling. This renders both options completely worthless to me, at least for this use case.

Is leaving everything to automatic batching (behind the scenes magic) all I can do?

And in the topic of the vertex painting, would you guys consider it a definite no no?

For example, say I wanna have this huge bridge made from modular parts

LODs are a must. And so is culling. If there are maybe 5 materials in each modular piece, and 100 of them… the bridge alone would be potentially over a thousand draw calls. Is there no way around it?

The bridge could be

  • rebuilt as a hierarchical instance static mesh.
  • simplified with manual batching
    Actor => Merge Actors => Batch. Where it will try to replace similar meshes with instances (would backup before using merge though)

Also make sure that unmovable objects have their mobility set to “Static”

You can also consider using Lightweight instances for some object cases

Back when “hardware transform and lighting” was a new thing, “number of draw calls” was a real concern, because the API overhead was high, and the shader switching cost on the hardware was high. The suggestion from graphics card vendors was to stay at 1000, and never go over 10,000.

These days, especially when using Direct3D12, Vulkan, or Metal, “draw calls” are much less of a problem. 100,000 draw calls is fine.

That being said – it’s always a good idea to use hierarchical instances static meshes, to bake your assets to a single material rather than a dozen sub-materials, and to keep the number of shader combinations small. (Always use material instances, and only use a small number of master materials)

In the end, Time-per-frame is what matters. If your time-per-frame on your target hardware is lower than “1000/target-frame-rate” milliseconds, then you’re good.

That many?

Is there any source/set of metrics on what games run what number of calls, texture overhead, etc? Any good place to get a sense of bang/buck and a flavor of the state-of-the-art?

The best I’ve found is to install the debug Direct3D runtime, and then instrument the particular app I care about.

You can also get profiling data from your own game, of course, inside the engine.

Note that “draw calls” also aren’t all the same. Sometimes, a single “draw call” (or a pair) will end up doing render-to-geometry-buffer type stuff that has massive amplification; other times, it’s just a thousand calls that all they do is change a single dynamic shader parameter. Each of those can have different performance implications by several orders of magnitude.

1 Like

Thanks a lot guys!

The 100k number also surprised me! I was getting itchy about having a thousand.

It’s still very absurd to me that Unity is so far ahead of Unreal as far as optimizing draw calls goes - it’s been the opposite for just about everything else, haha. Super curious as to why they’d differ so much.

Taking the opportunity to expand the topic, let me ask another question:

Do draw calls affect both CPU and GPU frame times? Technically it’s something done in the CPU in order to pass information to the GPU, so I always assumed it was strictly CPU overhead. But having recently watched An In-Depth look at Real-Time Rendering, I got the impression that it directly impacts the GPU frame time. If it does, the whole approach really changes for me… we could call 10k draw calls “fine”, but if that adds 1ms of overhead to the GPU frame time, that’s a big problem for me haha. Or is it the case that, as long as my game is strongly GPU bound, I could and should completely ignore the draw call count?

Thanks!

This is because of how the renderer works. The more features you have in your shading pipeline, the more draw calls will be needed for things like virtual textures, shadows, real-time light propagation, and so on. While there certainly exist cases where you can optimize draw calls (by using instancing, for example,) it seems to me like the reason Unity has fewer draw calls, is not that it’s “optimized more,” it’s that it “does less.”

(Which, if you’re targeting lighter weight clients, phones, and so on, might be exactly what you want. Not every player in the world has a D3D12 capable PC or console …)

Yes, “draw calls” can take GPU time, too; typically because of shader/state switching/setup time. So, as suggested above – if all you do is poke another scalar shader parameter in, the switching cost on the GPU will be minimal. If you need to set up different shaders, render targets, and texture formats, the switching cost on the GPU will be higher, potentially by many orders of magnitude.

“draw calls” isn’t a particularly helpful metric all on its own. You have to know what those calls are doing!

3 Likes

Thanks a lot for clarifying!! Cheers :smile:

So, apologies for the random-necro, but in regards to the above, how much better is an MPC in this regard?

If I update a value in that scalar, a call goes out to do so, but is, and forgive my ignorance here, is an MPC operatively any different? Does it need a call to update? Does the GPU keep a local copy?

Just-rambling, wondering if MPC’s are always-sent(?) and it would be a ‘free’ ride in that regard to push data?

I’ve always been a low-level, assembler, hardware type of guy so things like bin-passing on materials for nanite is ‘cool-stuff’ for me…

Thanks for anything you can provide.