This is very useful information, thank you. I might have misunderstood when and where to pick a runtime type. I’m running a couple of models that aren’t image/rendering related, they analyze (preprocessed) input data and provide predictions that I postprocess and convert into in-game actions. The inference is run adhoc when an event is triggered (not on tick or regular basis). My models run fine on the CPU but I’d rather use the GPU if possible (to free the CPU and because it should be faster (TBD).
On my dev PC, I have an NVIDIA card so the models run directly on the GPU, but I figured that users without a CUDA interface would have to use RDG runtimes instead. Did I misunderstand?
Here is my very basic/crude runtime selection function (for visibility in case others are trying to do the same):
UNeuralNetworkModel* UNeuralNetworkModel::CreateModel(UObject* Parent, UNNEModelData* ModelData)
{
using namespace UE::NNECore;
TArray<TWeakInterfacePtr<INNERuntimeGPU>> GPURuntimes;
TArray<TWeakInterfacePtr<INNERuntimeRDG>> RDGRuntimes;
TArray<TWeakInterfacePtr<INNERuntimeCPU>> CPURuntimes;
TArrayView<TWeakInterfacePtr<INNERuntime>> Runtimes = GetAllRuntimes();
for (int32 i = 0; i < Runtimes.Num(); i++)
{
if (Runtimes[i].IsValid()) {
if (auto CPURuntime = Cast<INNERuntimeCPU>(Runtimes[i].Get())) {
CPURuntimes.Add(CPURuntime);
UE_LOG(LogTemp, Warning, TEXT("CPU runtime available: %s"), *Runtimes[i]->GetRuntimeName());
} else if (auto GPURuntime = Cast<INNERuntimeGPU>(Runtimes[i].Get())) {
GPURuntimes.Add(GPURuntime);
UE_LOG(LogTemp, Warning, TEXT("GPU runtime available: %s"), *Runtimes[i]->GetRuntimeName());
} else if (auto RDGRuntime = Cast<INNERuntimeRDG>(Runtimes[i].Get())) {
RDGRuntimes.Add(RDGRuntime);
UE_LOG(LogTemp, Warning, TEXT("RDG runtime available: %s"), *Runtimes[i]->GetRuntimeName());
} else {
UE_LOG(LogTemp, Warning, TEXT("Non CPU/GPU/RDG runtime: %s"), *Runtimes[i]->GetRuntimeName());
}
}
}
// pick the first available runtime starting from GPU, then RDG, then CPU
if (GPURuntimes.Num() > 0)
{
TWeakInterfacePtr<INNERuntimeGPU> Runtime = GPURuntimes[0];
if (Runtime.IsValid())
{
TUniquePtr<IModelGPU> UniqueModel = Runtime->CreateModelGPU(ModelData);
if (UniqueModel.IsValid())
{
if (UNeuralNetworkModel* Result = NewObject<UNeuralNetworkModel>(Parent))
{
Result->GPUModel = TSharedPtr<IModelGPU>(UniqueModel.Release());
UE_LOG(LogTemp, Warning, TEXT("GPU Neural Network model created"));
return Result;
}
}
}
}
if (RDGRuntimes.Num() > 0)
{
TWeakInterfacePtr<INNERuntimeRDG> Runtime = RDGRuntimes[0];
if (Runtime.IsValid())
{
TUniquePtr<IModelRDG> UniqueModel = Runtime->CreateModelRDG(ModelData);
if (UniqueModel.IsValid())
{
if (UNeuralNetworkModel* Result = NewObject<UNeuralNetworkModel>(Parent))
{
Result->RDGModel = TSharedPtr<IModelRDG>(UniqueModel.Release());
UE_LOG(LogTemp, Warning, TEXT("RDG Neural Network model created"));
return Result;
}
}
}
}
if (CPURuntimes.Num() > 0)
{
TWeakInterfacePtr<INNERuntimeCPU> Runtime = CPURuntimes[0];
if (Runtime.IsValid())
{
TUniquePtr<IModelCPU> UniqueModel = Runtime->CreateModelCPU(ModelData);
if (UniqueModel.IsValid())
{
if (UNeuralNetworkModel* Result = NewObject<UNeuralNetworkModel>(Parent))
{
Result->CPUModel = TSharedPtr<IModelCPU>(UniqueModel.Release());
UE_LOG(LogTemp, Warning, TEXT("CPU Neural Network model created"));
return Result;
}
}
}
}
return nullptr;
}
(note that running on GPU isn’t always the best option, so use with care).
So, my question was how to convert my data (floats) into a render buffer and how to get it back. Based on what you shared, the creation of input bindings for RDG runtimes would be done like that:
FRDGBufferDesc InputBufferDesc = FRDGBufferDesc::CreateBufferDesc(sizeof(float), NeuralNetworkInputSize.X * NeuralNetworkInputSize.Y * 3);
FRDGBufferRef InputBuffer = GraphBuilder.CreateBuffer(InputBufferDesc, *FString("NeuralPostProcessing::InputBuffer"));
(I’ll have to figure how to convert the tensor shape into NeuralNetworkInputSize but it shouldn’t be too bad since we have xyRGB).
Then populating the buffer is where I’m getting lost. From what I saw in the docs, I probably need to do something like this:
FRDGBuffer IndexBuffer = GraphBuilder.CreateBuffer(
FRDGBufferDesc::CreateUploadDesc(sizeof(uint32), NumIndices),
TEXT("MyIndexBuffer"));
// Allocates an array of data using the internal RDG allocator for deferral.
FRDGUploadData<int32> Indices(GraphBuilder, NumIndices);
// Assign Data
Indices[0] = // ...;
Indices[1] = // ...;
Indices[NumIndices - 1] = // ...;
// Upload Data
GraphBuilder.QueueBufferUpload(IndexBuffer, Indices, ERDGInitialDataFlags::NoCopy);
Since I want to pass the data from CPU to GPU (and also upload an empty output bindings object), then I need to figure out how to hook into the RDG execution, get a reference to my buffer and figure out how to run the model and get the data out of the GPU to the CPU to process the output. (All of that should be somewhat clear once I fully grasp the Render Dependency Graph
the documentation)
Did I get that right?
Thanks for your patience!