Hello Aleksander, I am a colleague of Sakib, I have been digging a bit more and trying to understand how all this works and I have a bunch of information I want to validate:
After what Sakid did with the graphics queue, I tried to increase the pool size using d3d12.UploadHeap.BigBlock.PoolSize to 32 mb and also inside the FD3D12DynamicRHI::RHIEndFrame I increase the BufferPoolDeletionFrameLag value to have the buffers lingering a bit more so they get to be reused. After some insights traces, it overall help the graph on the ShaderTableCommit, but it does not really help our overall perf.
Looking into it a bit more, I was checking on have many primitives we are processing, using stat RayTracingGeometry I got the following:
[Image Removed]I see that nanite has the need to rebuild constantly and that we have a big number on the geometry count, in nanite everything almost all behaves as dynamic. We knocked down a few of them using the commands: r.RayTracing.Geometry.Text=0, r.RayTracing.Geometry.ProceduralMeshes=0, etc etc, that reduced our memory footprint. We had from before already some good configuration, so we dont really see a win in perf yet.
Finally checking the stats with stat D3D12Raytracing the shaderbindingtable needs to record a bunch of information everytime, setting the hit groups takes the biggest time, inside the: FD3D12CommandContext::RHISetBindingsOnShaderBindingTable
this specific code inside the lambda:
if (BindingType == ERayTracingBindingType::HitGroup)
{
if (Binding.BindingType != ERayTracingLocalShaderBindingType::Clear)
{
//UE_LOG(LogD3D12RHI, Log, TEXT("Set hit record data for RecordIndex %d on SBT %#016llx with mode: %d"), Binding.RecordIndex, ShaderTableForDevice, Binding.BindingType);
const FD3D12RayTracingGeometry* Geometry = FD3D12DynamicRHI::ResourceCast(Binding.Geometry);
SetRayTracingHitGroup(Device,
ShaderTableForDevice, Binding.RecordIndex,
Pipeline, Binding.ShaderIndexInPipeline,
Geometry, Binding.SegmentIndex,
Binding.NumUniformBuffers,
Binding.UniformBuffers,
Binding.LooseParameterDataSize,
Binding.LooseParameterData,
Binding.UserData,
Binding.BindingType,
Context.WorkerIndex);
I added a couple of extra cycle counts, and runned the parallelfor in single thread to mesure it. Yeah thing is just heavy, it seems to me we have way too many indiviual raytracing shaders and it scales lineraly. What are the strategies we can take towards helping this process?
My hardware specs are the following:
Processor: AMD Ryzen threadripper PRO 5965WX
Mem: 65k MB
Graphics: RTX A5500
[Attachment Removed]