I managed to use NVIDIA CUDA and Thrust library on Unreal Engine 4, so I’ve implemented real-time marching cubes by CUDA and visualized on Unreal Engine 4.
My brain MRA (Magnetic Resonance Angiography): 512 x 512 x 184 = 48,234,496 voxels
My thoracic CT (Computed Tomography): 512 x 512 x 463 = 121,372,672 voxels
120 million voxels, more than 8 million meshes can be handled in real time.
The bottleneck is the data transfer between GPU and CPU, as mentioned below.
I am facing exactly the same problem.
After calculating vertices and triangles on GPU, it is necessary to transfer the data from GPU to CPU, to pass the data to Procedural Mesh Component.
And (I think) the Procedural Mesh Component transfers the data back to GPU at some point.
Data transfer between GPU and CPU is quite slow and it often takes more than 10 msec in one way transfer: that means more than 20msec for GPU to CPU to GPU.
The best solution might be analyzing the source codes of Procedural Mesh Component or Runtime Mesh Component, and to make something new mesh component like “CUDA Mesh Component”, which does not need TArray for mesh data but need only GPU memory pointers by CudaMalloc.
However, it is probably too tough to implement for me…
At least, it would save some time if it is possible to set the raw data pointer of a TArray, like TArray::GetData() = a (CPU memory) pointer,
but it seems impossible like std::vector::data() is not changeable.
I would like to “move” or “reference” the data on a page-locked memory by CudaMallocHost to TArray instead of “copy”.
The next step is to implement the direct volume rendering like the following?
Anyway, the combination of CUDA’s performance and the strong power of Unreal Engine 4 has resulted in amazing beauty!