Rendering 5000 sprites in UE4 10x slower than Unity5?

Thanks for the info Nate, I’ll post back here with before/after results when I get started and will also look into the memory usage you’re reporting.

In the mean time, you can do stat memory (displays some brief high-level info) and memreport (writes out a file with more detailed stats) to see roughly where things are going.

Cheers,
Michael Noland

@Michael Noland, I wonder what’s the status of batching support. Haven been following your commits on master branch, it seems there’s lots of work haven been done on editor side lately. Thanks!

It’s in progress right now and still planned for 4.8, but no numbers to report yet though. I’m doing some refactoring to let me provide clean workflows in the editor to merge/split batches, etc…

Cheers,
Michael Noland

That’s interesting, if I understand correctly instanced meshes fall back to a single draw call per-mesh is that right? (+ 1 call per material on top, unless it’s 2-sided in which case 2 calls for a material?) Or is it actually one call per-material and the meshes are handled some other way?

Is it even possible to batch loose meshes into less draws? I know at DICE they took some higher level approaches and attached all of their static meshes together to reduce draw calls, since that gave them more performance benefit than culling them out I imagine.

EDIT: Sorry slight Hijack, just interesting stuff :stuck_out_tongue:

The grouped sprite component is in 4.8 now and out of experimental, so you should be able to check them out in 4.8 preview 3. Here are some numbers with a test that I think approximates yours.

I have two test cases, 5K elements using a large sprite and 100K using a smaller sprite (spatially), as at 100K at the large sprite ended up GPU bound due to overdraw.

100K grouped instances:

  • 5K large separate components - 85 ms render thread time, 5001 draw calls
  • 5K large instances on a grouped sprite component - 1.9 ms render thread time, 2 draw calls
  • 100K small separate components - Got tired of waiting for stats after a few minutes :slight_smile:
  • 100K small instances on a grouped sprite component - 1.85 ms render thread time, 2 draw calls (as long as the group isn’t dynamic, RT time is now constant, only GPU time increases as the # of sprites increases)

Note: Unfortunately I wasn’t able to do much about the cache uniform expressions overhead when using loose sprites it isn’t possible to safely make a persistent override proxy if the game code is using a MID, but grouped sprites dramatically reduce the # of components typically necessary, and should help quite a bit there.

Working with groups is pretty simple:

  • You can build them programmatically like a UInstancedStaticMeshComponent works, but they have fewer limitations (you can mix/match sprites and materials in one, it will generate additional draw calls as necessary)
  • You can convert selections in the level editor into a grouped component using the Merge button in the details panel if all selected items are sprite actors, or the right-click context menu if the selection is mixed (it will leave non-sprite objects alone but delete sprite actors, replacing them with a merged actor).
  • You can split a sprite group back into separate actors if you need to move/reposition something, and can then re-merge them.
  • You can sort them based on the rendering project settings TranslucencySortAxis, so that batches that contain translucent sprites render as expected.

Note: All sprites in a group will be drawn as one or a few draw calls (the mininum required given the materials and textures). This means that:

  • Culling will be done as a whole unit. Either all instances are drawn or none of the instances are drawn. You probably don’t want to group sprites from opposite ends of the map together.
  • Sorting will be done as a whole unit. If you have some translucent foreground sprites and some translucent background sprites, you probably don’t want to group them together. With the sort button they’ll sort correctly relative to each other, but a translucent player in the mid-ground can’t pass in between the two of them, it’ll either draw in front of both or behind both. These kinds of sorting issues only apply to Translucent materials, Masked materials don’t have the same issues but they only work for binary (0 or 1) opacity.

Cheers,
Michael Noland

BTW. Did that test with t.MaxFPS set to 0, so it was capping at 60 Hz. Changing to t.MaxFPS 1000 for the 100K grouped test shows a frame time around 4ms / 250 fps, but that’s not a very meaningful number, need to keep adding work until you’re back in the target range.

Cheers,
Michael Noland

Awesome work Michael!

Very cool!

So you’re also telling me that you also exceeded the Unity implementation with an extra 16 fps :wink: Stunning!

Michael you can share the map/scene or the project ? or give an example about how is made the scene etc ?

I love the ‘fighting talk’ nature of this thread - I wonder what I could persuade one of the engine developers to do if I could come up with something that another app could do better. How about… In unity you can output a signal with an alpha channel to a dedicated broadcast output card (I don’t know if that’s true), why can’t UE? (that’s a highly personalised problem I personally face).

what about moving them? having low cost static sprites is nice, but having them not kill performance when being moved would be even better.

Moving the entire component is basically free (e.g., all instances at the same rate for a parallax layer), but moving individual instances requires a rebuild of the vertex buffer. You will not be able to move 100k unique instances every frame at a reasonable frame rate, but if you split out dynamic and static instances and group only things that are all likely to be changing every frame together it should be comparable to leaving them as loose components. Disabling collision will also increase the speed of rebuilds since it avoids having to talk to PhysX for example.

Cheers,
Michael Noland

To be clear, that’s not what I’m saying at all. I haven’t done any benchmarking against Unity (I’m not legally allowed to by their EULA), I don’t know for sure that this benchmark is in any way comparable to Nate’s, or if my machine is comparable to his. It’s also generally unsound to use FPS for any sort of benchmark at all (time elapsed per thread is more accurate and less prone to measurement errors, unrelated things on the PC, etc…).

Cheers,
Michael Noland

Yeah I’ll either post pictures of the project settings and test Blueprint or a zip of the project. However, I built it in main (version # is already bumped to 4.9.0) and it’s saved on a programmer build, not a promoted build, so it probably won’t load directly in 4.8.0.

Cheers,
Michael Noland

I gave grouped sprite component a try, the fps went from 24 to 216 now on my PC. Thanks Michale, that’s awesome job done.

But then I tried to update the transform per frame on both Unity and Unreal test, I got:

Unity 5: 46fps
UE4: 2fps

Here’s the full comparison chart:

Also, please note that in unity’s case, all sprites are individual game objects which could have different script and components setup individually; In unreal, there’s one actor with one grouped sprite component which is much more difficult to assign different behavior to individual sprite.

Another area unity done different is that in unity all the batching is happen behind the scene, user only specific a game object to “static”, and it’ll be batched automatically without further action.

Thanks you! :slight_smile:
I like that because is to check between the engines (Urho3D - UnrealEngine4 - Unity5) with the same scene & config more or less for compare the results and post here.

Bha here is 10 million

Hi, I am a beginner on UE4. This is what I am just want for rendering a massive(says at least 1 million) of 2D object. So can you tell me more detail of groupd sprite component? Is it UPaperGroupedSpriteComponent?

Hi can you tell more detail about how to group sprite? Is it to use UPaperGourpedSpriteComponent in C++?

Having the CPU generate 100,000 2D sprite billboards into a streaming vertex buffer every frame should be fast.
I think the main problem here is that Unreal doesn’t optimize for animated 2D sprites using that mechanism.