How do I read StaticMesh triangle data from GPU memory to CPU memory?

I need to create an on-CPU triangle representation of a StaticMesh that I have in a level. I can not depend on the StaticMesh having bAllowCPUAccess set so I want to read the triangle data from the rendering representation.

I have read https://answers.unrealengine.com/questions/233998/view.html and https://github.com/microsoft/AirSim/blob/f423d9f5a4a0d0f4fb9a2e4ddb75251040e88f81/Unreal/Plugins/AirSim/Source/AirBlueprintLib.cpp#L502 which lead me to produce the following:

    void CopyTrianglesToCpu(UStaticMesh& StaticMesh)
	{
		UE_LOG(LogTemp, Error, TEXT("Running CopyTrianglesToCpu"));

		const int32 LodIndex {0};
		if (!StaticMesh.HasValidRenderData(true, LodIndex))
		{
			UE_LOG(LogTemp, Error, TEXT("The static mesh does not have valid render data."));
			return;
		}

		const FStaticMeshLODResources& Mesh = StaticMesh.GetLODForExport(LodIndex);

		TArray<FVector> CpuPositions;
		TArray<uint32> CpuIndices;

		ENQUEUE_RENDER_COMMAND(FCopyTriangles)
		([&](FRHICommandListImmediate& RhiCmdList) {
			// Copy vertex positions.
			const FPositionVertexBuffer& GpuPositions = Mesh.VertexBuffers.PositionVertexBuffer;
			const int32 NumPositions = GpuPositions.GetNumVertices();
			const FVertexBufferRHIRef& PositionsRhi = GpuPositions.VertexBufferRHI;
			CpuPositions.Reserve(NumPositions);
			const FVector* const PositionData = static_cast<const FVector* const>(
				RHILockVertexBuffer(PositionsRhi, 0, PositionsRhi->GetSize(), RLM_ReadOnly));
			for (uint32 I = 0; I < NumPositions; ++I)
			{
				CpuPositions.Add(PositionData[I]);
			}
			RHIUnlockVertexBuffer(PositionsRhi);

			// Copy triangle vertex indices.
			const FRawStaticIndexBuffer& GpuIndices = Mesh.IndexBuffer;
			const int32 NumIndices = GpuIndices.GetNumIndices();
			const FIndexBufferRHIRef& IndicesRhi = GpuIndices.IndexBufferRHI;
			CpuIndices.Reserve(NumIndices);
			switch (IndicesRhi->GetStride())
			{
				case 2:
				{
					const uint16* const IndicesData = static_cast<const uint16*>(
						RHILockIndexBuffer(IndicesRhi, 0, IndicesRhi->GetSize(), RLM_ReadOnly));
					for (uint32 I = 0; I < NumIndices; ++I)
					{
						CpuIndices.Add(static_cast<uint32>(IndicesData[I]));
					}
					RHIUnlockIndexBuffer(IndicesRhi);
					break;
				}
				case 4:
				{
					const uint32* const IndicesData = static_cast<const uint32*>(
						RHILockIndexBuffer(IndicesRhi, 0, IndicesRhi->GetSize(), RLM_ReadOnly));
					for (uint32 I = 0; I < NumIndices; ++I)
					{
						CpuIndices.Add(IndicesData[I]);
					}
					RHIUnlockIndexBuffer(IndicesRhi);
					break;
				}
				default:
					UE_LOG(
						LogTemp, Error, TEXT("Got unexpected static mesh indices stride %d."),
						IndicesRhi->GetStride());
					return;
			}
		});

		FlushRenderingCommands();

		// Sanity check.
		const int32 NumPositions = CpuPositions.Num();
		const int32 NumIndices = CpuIndices.Num();
		for (int I = 0; I < NumIndices; ++I)
		{
			const uint32 Index = CpuIndices[I];
			if (Index >= NumPositions)
			{
				UE_LOG(LogTemp, Error, TEXT("Got invalid vertex index %d."), Index);
			}
		}
	}

The results I get are incorrect, or at least make no sense to me. The indices are mostly, but not exclusively , monotonically increasing and includes multiple values that are higher than the size of the position buffer.

What is the proper way to read StaticMesh triangle data from GPU to CPU?

Hi MartinNilsson

I had to a similar task recently whilst investigating multithreading. Take a look at TFutures (promises) and TFunctions (Lambdas). They may help with your problem.
Execute a TFunction on the GPU thread and return the data you want back to the game thread through the TFuture.
Once you understand them, TFutures and TFunctions are extremely powerful

Basically a future works the same as a return value, except that it doesn’t need to be populated immediately. It gives the program a “promise” that it will return data eventually.

I will provide a simple example, but note I only wrote the code, I didn’t test it. But there are other examples online.

//HEADER
//-----------------------

//Variable to hold the return data from the GPU thread.
//Replace "int" with the desired datatype to return. 
TFuture<int> dataToReturnToGameThread; 


//Function to trigger the logic
void RunGPUThreadTask();

//Function to call once the operation is complete
void OnReturnToCPU();

//Function to execute on the GPU with a lambda used to return back to the Game thread
TFuture<int> FunctionToRunOnGPU(const UObject* outer, TFunction<void()>completionLambda);

//CPP
//-----------------------

void MyClass::RunGPUThreadTask()
{
	dataToReturnToGameThread = FunctionToRunOnGPU(this, [this]()
	{
		//Check to see if the data has been populated
		if(dataToReturnToGameThread.IsValid())
		{
			//Return the data back to the CPU thread
			AsyncTask(ENamedThreads::GameThread, [this]()
			{
				//Log to state that we have returned to the Game Thread
				UE_LOG(LogTemp, Log, TEXT("Successfully returned to Game Thread with %d"), dataToReturnToGameThread.Get());
				
				//Call the function to work with the returned data
				OnReturnToCPU()
			});
		}
	});
}


TFuture<int> MyClass::FunctionToRunOnGPU(const UObject* outer, TFunction<void()>completionLambda)
{
	//Execute this directly on a GPU thread
	return Async(EAsyncExecution::ThreadPool, [=]()
	{
		UE_LOG(LogTemp, Log, TEXT("Operating on GPU Thread"));

		//Any logic here that you want to execute on the GPU
		int dataToReturn = 10;

		//Return any data you want back to the Game Thread, which will then be accessable through the future
		return dataToReturn;
	
	}, completionLambda );
}


void MyClass::OnReturnToCPU
{
	//Do what ever you want with the data
int myInt = dataToReturnToGameThread.Get(); //Used to access the data

    }

Unfortunately, I do not know how to access the triangle data, but perhaps this will be a good start in at least knowing that you are working on GPU / CPU.

Hope this helps.

Good luck!

Alex

I think there’s an error in your GetStride switch in that you don’t change how you are interpretting the IndicesData. I think for case 4 it should be a const uint32* array?