We want to know how to reduce rhi thread time.
Steps to Reproduce
Enter our game and disable ParallelTranslate by setting r.RHICmd.ParallelTranslate.Enable to 0, the frame time will raise 1.6ms. The trace ran on a PC with CPU 13th Gen Intel(R) Core™ i7-13700 2.10 GHz and NVIDIA GeForce RTX 4060 video card and 32G RAM.
Hi there,
Just to confirm your reproduction steps. Your steps say you are setting r.RHICmd.ParallelTranslate.Enable to 0, and experiencing an increase in frame time. Did you mean that you are setting r.RHICmd.ParallelTranslate.Enable to 1and seeing an increase in frame time?
It makes sense that setting r.RHICmd.ParallelTranslate.Enable to 0 might result in increased frame time, since you are disabling an optimization to allow recording RHI command lists in parallel using multiple CPU cores instead of one.
However, from looking at your insights trace, it looks like your r.RHICmd.ParallelTranslate.Enable=1 case might have slightly higher GPU time, which makes me think you might have meant the opposite reproduction steps. If this is the case, can you confirm that these two traces are from exactly the same camera view / level state and whether r.vsync is 0 in both.
Regards,
Lance
I have managed to replicate an approximately 200us frametime difference in a blank project by toggling r.RHICmd.ParallelTranslate.Enable on and off.
Can you try different values of r.RHICmd.ParallelTranslate.MaxCommandsPerTranslate for me. The default is 256. Can you try, 512, 1024, 2048 and tell me what impact this has on your frametime? I was able to reduce the impact on GPU time from r.RHICmd.ParallelTranslate.Enable 1 by increasing this CVar. Please ensure you have r.vsync 0 (disabled) when testing.
I’m still investigating what the root cause of this might be. It would be helpful if you could tell me how much impact the above CVar has for you, since it might help narrow down the issue. I haven’t been able to replicate such a large frame time difference as what you’re seeing so far.
Regards,
Lance
Hi. I’ve taken a look at the before/after traces you provided in the original post. I think the root cause is that when parallel translate is enabled, there is significantly more work on the RHISubmissionThread. This is because the D3D12 RHI barrier tracker needs to generate a lot more barrier command lists, which inflates the cost of submitting work from around ~400ms to ~2ms. This extra work causes a delay in getting the commands to the GPU, causing the GPU to kick off later, which then has a knock-on effect the following frame, making the overall frame slower.
There’s not much I can suggest for UE 5.5 to improve this, aside from balancing the r.RHICmdList.ParallelTranslate.MaxCommandsPerTranslate console variable, as was mentioned above. Increasing that number will reduce the number of separate RHI translate jobs, which results in fewer command list submissions, and fewer barrier command lists that the submission thread needs to generate.
In UE 5.6, we entirely removed the D3D12 RHI barrier tracker, so you should see better performance when running with more parallel translate tasks if you were to upgrade your engine in future.
Cheers,
Luke
Sorry for my mistake, I want to say that the frame time decreased by setting r.RHICmd.ParallelTranslate.Enable to 0, and we always set the r.vsync to 0, those two traces was logged at same position and camera angle.
Hi, our colleage who did this testing is on a business trip, we will test it next week after he came back, I will send the result here as soon as we get it. Thanks!
Thanks, a small reproduction project would be the ideal, do you think you would be able to provide this? I understand if you are unable to reproduce this issue in a small / different project however.
Can you also report back on whether you have any changes to the r.RHICmdMinDrawsPerParallelCmdList CVar? It seems like your nanite base pass translation jobs are taking quite some time.
When you next test can you measure the overhead using the output of abtest “r.RHICmd.ParallelTranslate.Enable 0” “r.RHICmd.ParallelTranslate.Enable 1”
This should give a very accurate measure of the overhead you are getting. It’s a bit hard to measure this accurately from looking at two insights traces.
Regards,
Lance
Hi, I think it`s hard to prepare a small project to reproduce the problem, but do you mind download our game through steam? We can provide you a steam key or gift with a test or dev branch. If you need it we need your steam ID/Account and your Email address, you can send them to me. And we will add the CVar to next testing case.
I’m not sure how much value it would add to test a steam version of the game, since I wouldn’t be able to attach a debugger to it, without the original source code. My recommendation would be to first try the above CVars, and see if they have any effect, then report back with the results.
Just an FYI that cases are now beginning to be replicated to the Unreal Developer Community. So non-confidential posts here will be a lot more public now. Therefore, you might want to consider editing your post to remove your email address (I have saved it in my case notes), or consider making this post confidential.
Regards,
Lance
Thanks for the mention, I changed the message, our dev version steam package could be attached by other tools like Renderdoc or Nsight, and test result shows that frame time will drop as the number increase, you can check it through trace files, they are logged at same position with same camera angle.
It seems like r.RHICmd.ParallelTranslate.MaxCommandsPerTranslate 2048 is the sweet spot for you. It looks like the overhead you were experiencing is pretty much gone with this setting, while RHI parallelization still remains fairly good. I’m going to elevate this to a subject matter expert at Epic, with my notes included, for further investigation.
Regards,
Lance
We also went from 256 to 1024 on MaxCommandsPerTranslate and found a win.