Real-Time Marching Cubes by CUDA on Unreal Engine 4

I managed to use NVIDIA CUDA and Thrust library on Unreal Engine 4, so I’ve implemented real-time marching cubes by CUDA and visualized on Unreal Engine 4.

My brain MRA (Magnetic Resonance Angiography): 512 x 512 x 184 = 48,234,496 voxels


My thoracic CT (Computed Tomography): 512 x 512 x 463 = 121,372,672 voxels


120 million voxels, more than 8 million meshes can be handled in real time.

The bottleneck is the data transfer between GPU and CPU, as mentioned below.
I am facing exactly the same problem.

After calculating vertices and triangles on GPU, it is necessary to transfer the data from GPU to CPU, to pass the data to Procedural Mesh Component.
And (I think) the Procedural Mesh Component transfers the data back to GPU at some point.

Data transfer between GPU and CPU is quite slow and it often takes more than 10 msec in one way transfer: that means more than 20msec for GPU to CPU to GPU.

The best solution might be analyzing the source codes of Procedural Mesh Component or Runtime Mesh Component, and to make something new mesh component like “CUDA Mesh Component”, which does not need TArray for mesh data but need only GPU memory pointers by CudaMalloc.
However, it is probably too tough to implement for me…

At least, it would save some time if it is possible to set the raw data pointer of a TArray, like TArray::GetData() = a (CPU memory) pointer,
but it seems impossible like std::vector::data() is not changeable.
I would like to “move” or “reference” the data on a page-locked memory by CudaMallocHost to TArray instead of “copy”.

The next step is to implement the direct volume rendering like the following?

Anyway, the combination of CUDA’s performance and the strong power of Unreal Engine 4 has resulted in amazing beauty!

…looks great. Will you be posting a guide or tutorial?

Thank you so much for your comment!

I’ve already written the introduction tutorial of how to use CUDA kernel functions from Unreal Engine 4, but currently only in Japanese…

I’ve attached several images in English, and also pasted some basic codes.
I hope it will help you!

Regarding that GPU-CPU-GPU transfer:
It would be nice if it was possible this way:…-from-gpu.html
But apparently, such solution tries to write into rendering buffers, while the other thread renders it, and then it flickers (on average maybe once per sec)
Anyone has any idea how to lock the buffer?

Thank you for taking the time to post.

Was it relativity painless to get CUDA integrated?

As for Windows, if you know well about static link library, it’s not so difficult, but if you also want to use Thrust library, it is rather difficult.
In fact, there were some compiling errors when I tried to use Thrust from Unreal Engine, because some macros used in Unreal Engine and Thrust were exactly the same name.
So I had to customize the Thrust library a little.

That’s not bad at all; thanks. CUDA is so huge that I wasn’t sure if it would be as easy as what I’ve integrated thus far.

This project has been elected as one of the NVIDIA Edge Program Winners - November 2017 NVIDIA Edge Program 2017 年 11 月の受賞作品発表! - Unreal Engine

Thank you so much!