Faster way of reading RenderTarget data? copy GPU data from compute shader?

TK-Master · February 10, 2016, 6:06pm

Hello everyone,

So I have updated VaOcean plugin to work in 4.10… everything works fine but there is one critical thing missing, FFT displacement readback on the CPU.
I have tried using the ReadFloat16Pixels function to read the pixel data from the displacement render target but it turns out to be way too expensive (adds about 7ms, essentially half the fps).

Now correct me if I’m wrong but there should be a way to access the data in a more direct way from the GPU memory.
Problem is… I have zero experience with compute shader UAVs and all that kind of stuff so I have no idea how I could go about doing that >.<

I could really use some help with this one.

Here is the github link of the recent commit which uses the ReadFloat16Pixels method.

I would greatly appreciate it if someone can take a look at it.

Thanks in advance!

TK-Master · February 14, 2016, 1:14am

Bump. Any ideas? Really hoping someone can lend a hand with this.

Syedhs · July 14, 2016, 11:29am

I realized this thread is a bit old (5 months old)… but in DirectX9, I implement 2 RTTs but only one work in 1 cycle (it is manually updated). So when 1 RTT is updated, the RTT is then locked, and then in separate thread the bits are read (and copied etc… whatever that takes too much time). While the RTT is locked and operated in separate thread, we now use 2nd RTT (which is now in the next rendering loop) and and this RTT is again locked, and then again in separate thread… (you get the drift)…So this way, the rendering loop doesn’t need to wait for operations on RTT is finished.

HTH.

TK-Master · July 15, 2016, 1:05pm

Thanks for the reply! don’t worry about the date, the topic is still very very relevant to me.

For now I have abandoned any attempts of copying data from the GPU and instead the plan is to use a smaller downsampled FFT grid on the CPU (which is used for physics) and a larger one on the GPU for rendering.

Your method sounds like it might just work though! hopefully I will be able to figure it out.

Syedhs · July 16, 2016, 6:54am

The code has been done in my old engine and it is fast… except for sometimes, the first RTT is still not yet finished in the separate thread, but second RTT is now ready to be locked.

I will probably pursue this approach in Unreal (50-50 though, because this requirement may or may not needed). What I planned is to use two SceneCapture2d which both are manually updated. And then in the rendering loop, only one is updated and then I copy the texture bits pointer into the thread (and this thread will copy and dump the bits into memory map) etc etc…