People say that SceneCapture2D is expensive “because it renders the scene twice” and while that checks out on the surface, I’m finding that there’s quite a bit of additional overhead when using SceneCapture2Ds to render portions of the scene relative to the base rendering cost.
Specific case: I am trying to use a SceneCapture2D on character actors, configured to only render the character’s skeletal mesh to the depth channel (which should, in theory, be CHEAPER than rendering the entire skeletal mesh with base color and everything), so that I can generate stencil shadow decals on the ground. Note that this is using the ShowFlag method, the environment, lighting, postprocessing, etc., is all disabled. The SC2D is only concerned with the skeletal mesh and I’ve verified that, and it’s only rendering depth info to a 256x256 R16f render target.
In theory: that will incur “double rendering cost” because the skeletal mesh must be rendered twice; once to draw it to the viewport, and once again to draw it to the SceneCapture (theoretically, a bit less, since the SceneCapture is rendering depth-only and rendering at a much lower screen resolution). But what I am finding in testing is that the cost of the SceneCapture is HIGHER than the cost of rendering the skeletal mesh!
If I place 3 of these actors in the scene with SceneCapture2D set to capture every frame, the render time is about 5ms SLOWER than if I turn off the SceneCapture2D and render 6 of the actors on screen. Why should it be that drawing a skeletal mesh to the viewport 6 times is fast, but drawing a skeletal mesh to the viewport 3 times, and then drawing it to a RenderTarget 3 times, is so much slower? Shouldn’t the cost be, if not faster (due to the SC2D rendering less data at lower resolution), at least comparable?
SceneCapture2D rendering in Unreal Engine can have additional overhead because of the following reasons:
The engine has to allocate a new render target and set it up, which takes time and memory.
The SceneCapture2D has to render the scene from its perspective, which can include additional rendering calculations and data transfer between CPU and GPU.
The engine has to do additional post-processing on the render target, such as filtering or encoding, which adds extra cost.
In your case, the reason the SceneCapture2D is slower than just rendering the skeletal mesh to the viewport multiple times could be due to the overhead of the additional calculations and data transfer required for the SceneCapture2D. The engine might not be optimized for the specific use case of only capturing the depth information of a skeletal mesh.
One thing you can try to optimize the performance is to adjust the Render Target settings to minimize the overhead. For example, try using a smaller render target size or lowering the depth precision. Additionally, you could try to minimize the number of SceneCapture2D instances in the scene, and pool the render targets to reduce the overhead of allocating new ones.
You really need to use the GPU profiler… your render targets will show up as a separate bar chart and you can sift through the cost.
That being said I find this is usually caused by not disabling all the post process and lighting effects. For some reason even in a depth-only RT, Unreal tries to render everything, driving the cost up.
I appreciate the speedy reply, let me take these a piece at a time and see if I can come up with a solution:
The engine has to allocate a new render target and set it up, which takes time and memory
But that should only be happening at game start, when the render target is created, yes? This is a performance decrease that persists indefinitely, even if no new actors (meaning no new RenderTargets) are generated.
The SceneCapture2D has to render the scene from its perspective, which can include additional rendering calculations and data transfer between CPU and GPU.
True. I suppose that’s an unavoidable cost, though is the overhead of this really so high that it can add 2+ms of render time per SC2D?
The engine has to do additional post-processing on the render target, such as filtering or encoding, which adds extra cost.
I think this may be the cause of the problem… that there’s a cost associated with taking the rendered data and encoding/storing it as a texture, and then accessing that texture. Whereas the normal rendering process doesn’t do this, it just keeps the data in the GBuffers.
One thing you can try to optimize the performance is to adjust the Render Target settings to minimize the overhead. For example, try using a smaller render target size or lowering the depth precision
This has no effect. I can set the Render Target size to 2x2 and there’s no (or negligible) impact on performance. It’s a fixed overhead cost of the SC2D running, or evidently so. No matter how much you strip out of the SC2D, there’s a performance hit when it ticks, whether you render basecolor to a 1024x1024 texture with alpha or you render depth-only to a 64x64 red-only target that hit doesn’t change much.
I think you’ve got it figured out though, that the discrepancy is not in the rendering itself, but the taking of that render and making an accessible texture from it.
I wonder if there’s a way to do what I’m doing without that step from the GBuffer directly.
Example: 256 depth only RT, most of the cost is unaccounted for which can probably be attributed to the overhead of the RT itself.
If yours is similar, then I’m not sure what else you could do to optimize the RT.
Edit: Also if you aren’t already, since this is just for a stencil shadow you should be able to use the show-only actor list. Add your skeletal mesh to the list, might reduce some of the cost.
I followed the first one down for a ways but I’m not seeing anything that explains much. It just looks like the “Scene” draw time for each individual actor is about half the Scene draw time for the entire actual Scene. That doesn’t seem right but drilling in doesn’t show any particular problem step.
Also if you aren’t already, since this is just for a stencil shadow you should be able to use the show-only actor list.
Seems like it’s still trying to render shadow depths and a full base pass for the RT… can’t help but wonder if this is a bug (or maybe this is just how RTs behave in forward rendering?)
Sorry I’m not really sure what more can be done at this point.
Seems like it’s still trying to render shadow depths and a full base pass for the RT… can’t help but wonder if this is a bug (or maybe this is just how RTs behave in forward rendering?)
Even if it were true, the math is just wrong.
I mean, pause to consider:
it takes 5.81ms to render the ENTIRE scene, including all 4 actors being SceneCapture’d (plus all of the environment geometry, lighting, etc)
it takes 13.45ms to just render the 4 scene capture actors.
Even if it’s doing “superfluous rendering” (the shadow depths and base pass even though it only needs depth) shouldn’t the sum total cost of rendering part of the components in a scene necessarily be smaller than the sum total cost of rendering the entire scene including those components?
That’s why I made the thread; why is it exponentially MORE costly to render an asset via SceneCapture2D to a render target than it is to just render it?
Hopping back in to say that, after a brief abortive attempt to get this working with RVTs (which would have worked if RVTs allowed skeletal meshes to write to them, but I digress) I was able to hack together a fairly effective solution using a single SceneCapture and a bunch of cloned pose followers.
Basically, every time an enemy character is spawned into the level (up to a maximum of 64 actors), I have a master scene component get a report of it, copy its skeletal mesh to a grid-based location, set it to follow the master pose of the actor that reported it, and do a single orthographic render at 4k of it. So I’ve got copies of the skeletal meshes that exist in the level, hidden from view (they render only in SceneCapture) arranged in a tile grid, all getting captured at once. I then do a simple SubUV operation to pull a region of the master “shadow texture” render target specific to each actor.
It’s far more performant since it only has to perform the base pass once, and it incurs no additional cost adding actors (up to 64; I shouldn’t ever have 64 enemies live at one time. It will take some additional logic to “reallocate” slots in the master grid when enemies are removed from the game but as a base proof of concept it works.) This also allows me to force the shadow-generating mesh to render at the lowest LOD no matter what (which cuts on triangle cost since I don’t need high detail for shadow stencils).
I’m necroing this thread because the more I dig into this with ProfileGPU, the more bewildered I become.
For a depth-only render pass on an object which, for what it’s worth, is not even set to visible, I’m spending almost half of the SC’s render time on SkyAtmosphere LUTs.
I also had to disable the atmospheric fog actor to get this result because even though I’m using a ShowOnly list with all postprocessing disabled, if I don’t do that, the time on the scene capture nearly doubles because the scenecapture insists on rendering the atmospheric fog actor. An actor which is not on the ShowOnly list, whose contribution is supposed to be disabled via the ShowOnly flags on the SC, and also which shouldn’t make any difference regardless for a depth-only render pass, on an object which is not even visible in the scene.
Also seeing a lot of waste on recomputing custom depth and stencil values. Now, I will grant that because this is a “depth only pass” there’s a certain logic in also computing the custom depth value. I wish I could bypass this but I at least understand why it’s here
But again, still seeing setup time on sky atmosphere contributions, shadow projection computations, etc. I’ve managed to get my number of scenecaptures down to 2 for the entire set of actors in the game (player is 1, all other enemies is the second) but I’m spending close to half my total frame time just on rendering these lowest-possible-LOD skeletal meshes in a depth-only ortho plane with absolutely no environment present. My frame budget is tightening and this is by far the largest bottleneck of all, surely there must be some fix to all of these redundant and needless render steps?