Running Neural Network Inference with GPU Input/Output tensors


I’ve been trying to implement a neural network inference as a part of the Render Graph, basically applying neural network inference to each frame of the renderer. I have been referencing GitHub - microsoft/OnnxRuntime-UnrealEngine: Apply a Style Transfer Neural Network in real time with Unreal Engine 5 leveraging ONNX Runtime. for the most part, which covers the basic implementation and registering the view extension that adds a style pass to the render thread.

The process itself is divided into the following steps:

  1. Texture2D is fetched through RHI FRHITexture2D* Texture = SourceTexture->GetRHI()->GetTexture2D();
  2. Texture is copied from GPU to CPU, resized and converted to float array
  3. That array is moved to an Input Tensor
  4. Inference is performed on GPU, ONNX in the backend performs moving input and output tensors to and from GPU respectivelly.
  5. Output is resized to input image dimensions and converted to byte array
  6. Output is copied from CPU to GPU.

The problem I am having is that this whole process takes about 50ms, tanking the FPS. The inference itself takes less than 10ms.

My question is: Is there a way to copy the data from GPU to a GPU tensor and using it as is without preprocessing and expensive CPU <-> GPU communication? Is there a way to do the same for output?

Thanks in advance!