Something like RHICopyToStagingBuffer for structured buffers?

I’m trying to copy data from a structured buffer on the GPU back to the CPU. RHILockStructuredBuffer stalls while the copy takes place. I see in 4.22 that RHICopyToStagingBuffer was added. With RHILockStagingBuffer and a fence, this is the perfect async soluation to my problem. However, this only works with RHIVertexBuffer. Is there any way to achieve this with FRHIStructuredBuffer?

I found a partial solution.

I created FRHIStructuredBufferReadBack, a subclass of FRHIGPUMemoryReadback. It creates an intermediate FRHIVertexBuffer/UAV and a FStagingBufferRHI. Inside EnqueCopy, I copy the structured buffer to the intermediate vertex buffer with a little compute shader that uses RWByteAddressBuffers. Then I queue the copy to the staging buffer using CopyToStagingBuffer.

The difference compared to FRHIBufferReadBack is that extraneous intermediate FRHIVertexBuffer and compute shader. It wouldn’t be necessary if we could create vertex buffers with D3D11_USAGE_STAGING usage, but that’s not exposed in the RHI.

So, this solution is not bad. I can queue an async CPU read, without flushing the render state. The extra copy is not ideal, but it’s way faster than FRHILockStructuredBuffer. Hopefully Epic adds support for copying structured buffers to a staging buffer.