I try to get data that I compute on the GPU using CUDA into a vertex buffer for rendering without copying the data back to the host in between.
The standard way would be:
// The game thread does some computation and copies the result to CPU
void* gpuDataPtr = doSomeCudaComputation();
void* cpuDataPtr = new uint8_t[...];
cudaMemcpy(cpuDataPtr, gpuDataPtr, ..., cudaMemcpyDeviceToHost);
// Later the render thread updates the vertex buffer
void* cpuUnrealPtr = RHILockVertexBuffer(VertexBufferRHI, ..., RLM_WriteOnly);
FMemory::Memcpy(cpuUnrealPtr, cpuDataPtr);
RHIUnlockVertexBuffer(VertexBufferRHI);
This does GPU-to-CPU-to-GPU copies though and might therefore be quite costly when there is lots of data.
Instead I want to do something like this:
// The game thread does some computation and copies the result to (a different location on the) GPU
void* gpuDataPtr = doSomeCudaComputation();
void* gpuDataPtrForRendering = cudaMalloc(...);
cudaMemcpy(gpuDataPtrForRendering, gpuDataPtr, ..., cudaMemcpyDeviceToDevice);
// Later the render thread updates the vertex buffer
void* gpuUnrealPtr = MapVertexBufferIntoCuda(VertexBufferRHI);
cudaMemcpy(gpuUnrealPtr, gpuDataPtrForRendering, ..., cudaMemcpyDeviceToDevice);
UnmapVertexBufferFromCuda(VertexBufferRHI);
This only does GPU-to-GPU copies. I implemented MapVertexBufferIntoCuda()
and UnmapVertexBufferFromCuda()
successfully using Cuda’s Graphics Interop functionality and it seems to work properly.
My problem: When using the latter technique some frames are broken and the object will not be rendered correctly. Almost as if I modify the buffer while it is being used. I tried to figure out whether RHILockVertexBuffer()
and RHIUnlockVertexBuffer()
do something special to protect buffers while they are being written to but couldn’t figure it out.
Can anyone help me?