Download

Writing Data to RDG Structured Buffer / General RDG Questions

Hey all!

I’m a bit new to Unreal and I’m working on writing some compute shaders. Currently I’m using RHI resources for the shader parameters and calling ENQUEUE_RENDER_COMMAND to actually run the shader, but I was hoping to utilize the new RDG features to keep everything (relatively) compatible with newer versions of Unreal, as well as cutting back on some of the boilerplate.

I’ve been having a bit of trouble finding resources about using RDG. Mainly, I’ve been looking at the RDG 101 Crash Course slides (Box), but there are a few things I haven’t been able to figure out:

  1. Does pass creation and resource allocation need to happen in the render thread?
    FRDGBuilder requires an RHICommandList, so I’m assuming that it should be only be called in the render thread. In that case, does it need to be done using ENQUEUE_RENDER_COMMAND, or something else?

  2. How can I pass data into a RDG buffer?

With RHICreateStructuredBuffer I was able to pass in data at creation time, or read/write directly after I did a RHILockStructuredBuffer. Would locking the underlying RHI data for the RDG buffer and writing directly be the correct method?

  1. Are there any examples of working RDG compute shaders, or any more documentation?

I’m bound to have more questions in the future, so I’d like to find some working examples to reference. If anyone could point me in the direction of some more resources, that’d be a huge help.

After a bit more reading (I somehow missed that the slides contained a notes section with further explanation…) and looking through the engine source code, I’ve gotten some answers.

  1. It appears that the graph builder should only be invoked from the rendering thread. I imagine that for a single execution, ENQUEUE_RENDER_COMMAND would be effective.

  2. The underlying RHI resources are only valid during the execution of the lambda function added in AddPass. Since RDG resources are allocated as needed during the pass, and it’s actually possible to use both RHI resources and RDG resources in the same shader, it would be better to keep data in RHI resources to be maintained between frames. Another option would be to use GraphBuilder.QueueTextureExtraction to get the underlying IPooledRenderTarget, which could be later added using GraphBuilder.RegisterExternalTexture, but I don’t know the advantages of one approach over the other.

  3. I’m still on the lookout for any more resources and examples, but the engine source code was able to help a little.

Hey!

I’m interested in this as well. In UE 4.24 most of the shaders now uses RDG, so I’m taking a look at it.

Problem is that the porting to RDG is still in progress, so for now we have a bunch of GraphBuilder that are separated. Ideally, we should have one graph to execute with all the rendering stuff needed in it. That’s what I got from the slides.

Anyway, I’d like to keep this updated if I can find something useful!

Cheers

I’m glad other people are interested in this too!
Hopefully this thread will be useful to people attempting to implement shaders with RDG in the future.

As for myself, I’ve hit a bit of a snag: it seems that whenever I attempt to run a pass with an FRDGBufferUAVRef parameter I’m able to get the underlying RHI resource for the UAV, but not the buffer the UAV references. Though after some debugging I’ve found that the compute pass completely fails when I use a RDG buffer, in addition to giving the error “Seams shader [shader]'s parameter structure has changed without recompilation of the shader”.

This is odd, especially as the same shader executes fine when using the RHI types, and that the only shader preprocessor definition is a thread group count change.

Here’s the simple shader I’m working with (it takes four uniformly distributed random samples in the range of [0, 1] and produces four Gaussian distributed random samples using the Box-Muller method):



// BoxMuller.usf

RWStructuredBuffer<float4> NoiseBuffer;
uint2 NoiseBufferSize;

static const float Pi = 3.14159265358979323846f;

uint CalculateIndex(uint2 id)
{
 return (id.y * NoiseBufferSize.x) + id.x;
}


[numthreads(THREADGROUPSIZE_X, THREADGROUPSIZE_Y, THREADGROUPSIZE_Z)]
void main( uint3 id : SV_DispatchThreadID )
{
 uint Index = CalculateIndex(id.xy);
 float4 InputNoise = NoiseBuffer[Index];

 float a0 = sqrt(-2 * log(InputNoise.x));
 float t0 = 2 * Pi * InputNoise.y;

 float a1 = sqrt(-2 * log(InputNoise.z));
 float t1 = 2 * Pi * InputNoise.w;

 NoiseBuffer[Index].x = a0 * cos(t0);
 NoiseBuffer[Index].y = a0 * sin(t0);
 NoiseBuffer[Index].z = a1 * cos(t1);
 NoiseBuffer[Index].w = a1 * sin(t1);
}


Here’s the parameter struct definition:



 DECLARE_GLOBAL_SHADER(FBoxMullerShader)
 SHADER_USE_PARAMETER_STRUCT(FBoxMullerShader, FGlobalShader)

 BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
  SHADER_PARAMETER_RDG_BUFFER_UAV(RWStructuredBuffer<float4>, NoiseBuffer)
  SHADER_PARAMETER(FIntPoint, NoiseBufferSize)
 END_SHADER_PARAMETER_STRUCT()


And lastly the code for executing a compute pass



unsigned int NumElements = BufferSize.X * BufferSize.Y;

 // Allocate a large array for noise data
 TArray<FVector4> NoiseData;
 NoiseData.Init(FVector4(), NumElements);

 // Generate some random uniform noise
 for (unsigned int i = 0; i < NumElements; i++)
 {
  NoiseData* = FVector4(FMath::FRandRange(0, 1), FMath::FRandRange(0, 1), FMath::FRandRange(0, 1), FMath::FRandRange(0, 1));
 }

 FIntVector BoxMullerGroupCount = FComputeShaderUtils::GetGroupCount(BufferSize, FBoxMullerShader::ThreadsPerGroupDimension);
 FIntVector InitialComponentsGroupCount = FComputeShaderUtils::GetGroupCount(BufferSize, FInitialComponentsShader::ThreadsPerGroupDimension);

 ENQUEUE_RENDER_COMMAND(SceneDrawCompletion)
 ([this, NumElements, BoxMullerGroupCount, InitialComponentsGroupCount, NoiseData]
 (FRHICommandListImmediate& CommandListImmediate)
 {
  FRDGBuilder GraphBuilder(CommandListImmediate);

  FRDGBufferDesc NoiseBufferDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(FVector4), NumElements);
  FRDGBufferRef NoiseBuffer = GraphBuilder.CreateBuffer(NoiseBufferDesc, TEXT("NoiseBuffer"));
  FRDGBufferUAVRef NoiseBufferUAV = GraphBuilder.CreateUAV(NoiseBuffer);

  FBoxMullerShader::FParameters* BoxMullerParameters = GraphBuilder.AllocParameters<FBoxMullerShader::FParameters>();
  BoxMullerParameters->NoiseBuffer = NoiseBufferUAV;
  BoxMullerParameters->NoiseBufferSize = BufferSize;

  TShaderMapRef<FBoxMullerShader> BoxMullerShader(GetGlobalShaderMap(GMaxRHIFeatureLevel));

  GraphBuilder.AddPass(RDG_EVENT_NAME("Gaussian Noise Generation"), BoxMullerParameters, ERDGPassFlags::Compute,
  &](FRHICommandList& CommandList)
  {
   /* FStructuredBufferRHIRef NoiseBufferRHI = static_cast<FRHIStructuredBuffer*>(BoxMullerParameters->NoiseBuffer->GetParent()->GetRHI());

   FVector4* NoiseBufferData = static_cast<FVector4*>(RHILockStructuredBuffer(NoiseBufferRHI, 0, sizeof(FVector4), EResourceLockMode::RLM_WriteOnly));
   FMemory::Memcpy(NoiseBufferData, NoiseData.GetData(), NumElements);
   RHIUnlockStructuredBuffer(NoiseBufferRHI); */

   FComputeShaderUtils::Dispatch(CommandList, *BoxMullerShader, *BoxMullerParameters, BoxMullerGroupCount);
  });

  GraphBuilder.Execute();
 });


If anyone has any insight as to what might be causing the errors with the RDG buffer while the RHI buffer works fine, I’d appreciate it!

So I simplified my test shader pass (still no idea what’s going wrong with it, but that is an issue for another time). This time I’ve tried creating a shader that reads from an input texture copies the pixels to an output texture.

Curiously, even though my input texture is declared in the pass parameters and created by the graph builder, I get the following error message:


"Ensure condition failed: Texture->HasBeenProduced() [File:D:/Build/++UE4/Sync/Engine/Source/Runtime/RenderCore/Private/RenderGraphValidation.cpp] [Line: 338]
Pass TestShaderPass has a dependency on the texture TestInputTexture that has never been produced."

Here is the code for running the pass:



FIntVector GroupCount = FComputeShaderUtils::GetGroupCount(BufferSize, FTestShader::ThreadsPerGroupDimension);

 ENQUEUE_RENDER_COMMAND(SceneDrawCompletion)
 (&]
 (FRHICommandListImmediate& CommandListImmediate)
 {
  FRDGBuilder GraphBuilder(CommandListImmediate);

  FRDGTextureDesc InputTextureDesc = FRDGTextureDesc::Create2DDesc(BufferSize, EPixelFormat::PF_FloatRGBA, FClearValueBinding::BlackMaxAlpha,
   TexCreate_None, TexCreate_ShaderResource, false);
  FRDGTextureRef InputTexture = GraphBuilder.CreateTexture(InputTextureDesc, TEXT("TestInputTexture"));

  FRDGTextureDesc OutputTextureDesc = FRDGTextureDesc::Create2DDesc(BufferSize, EPixelFormat::PF_FloatRGBA, FClearValueBinding::BlackMaxAlpha,
   TexCreate_None, TexCreate_ShaderResource | TexCreate_UAV | TexCreate_RenderTargetable, false);
  FRDGTextureRef OutputTexture = GraphBuilder.CreateTexture(OutputTextureDesc, TEXT("TestOutputTexture"));

  FTestShader::FParameters* PassParameters = GraphBuilder.AllocParameters<FTestShader::FParameters>();
  PassParameters->InputTexture = InputTexture;
  PassParameters->OutputTexture = GraphBuilder.CreateUAV(OutputTexture);
  PassParameters->OutputTextureSize = BufferSize;

  TShaderMapRef<FTestShader> TestShader(GetGlobalShaderMap(GMaxRHIFeatureLevel));

  GraphBuilder.AddPass(RDG_EVENT_NAME("TestShaderPass"), PassParameters, ERDGPassFlags::Compute,
  &](FRHICommandList& CommandList)
  {
   FTexture2DRHIRef InputTextureRHI = static_cast<FRHITexture2D*>(PassParameters->InputTexture->GetRHI());
   uint32 Stride;
   void* TextureData = RHILockTexture2D(InputTextureRHI, 0, EResourceLockMode::RLM_WriteOnly, Stride, false);

   // Copy some data

   RHIUnlockTexture2D(InputTextureRHI, 0, false);

   FComputeShaderUtils::Dispatch(CommandList, *TestShader, *PassParameters, GroupCount);
  });

  GraphBuilder.Execute();
 });

As well as the shader definitions:


#pragma once

#include "CoreMinimal.h"

#include "Shader.h"
#include "GlobalShader.h"
#include "ShaderParameterUtils.h"
#include "ShaderParameterStruct.h"

class FTestShader: public FGlobalShader
{
public:
 static const int32 ThreadsPerGroupDimension = 32;

 DECLARE_GLOBAL_SHADER(FTestShader)
 SHADER_USE_PARAMETER_STRUCT(FTestShader, FGlobalShader)

 BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
  SHADER_PARAMETER_RDG_TEXTURE(Texture2D<float4>, InputTexture)
  SHADER_PARAMETER_RDG_TEXTURE_UAV(RWTexture2D<float4>, OutputTexture)
  SHADER_PARAMETER(FIntPoint, OutputTextureSize)
 END_SHADER_PARAMETER_STRUCT()


 static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters)
 {
  return IsFeatureLevelSupported(Parameters.Platform, ERHIFeatureLevel::SM5);
 }

 static inline void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment)
 {
  FGlobalShader::ModifyCompilationEnvironment(Parameters, OutEnvironment);
 }
};

IMPLEMENT_GLOBAL_SHADER(FTestShader, "/Shaders/Private/Test.usf", "main", SF_Compute);

And the shader itself:



Texture2D<float4> InputTexture;
RWTexture2D<float4> OutputTexture;
int2 OutputTextureSize;

[numthreads(32, 32, 1)]
void main( uint3 id : SV_DispatchThreadID )
{
    OutputTexture[id.xy] = InputTexture[id.xy];
}

The pass runs correctly when the shader parameters only contain the UAV (I’ve tested this by having the shader use the pixel coordinates for red and green, and the texture shows as you’d expect when running the vis command). With just the one texture, I’m running into the same issue as before where the data the UAV refers to is invalid during the pass and the GetRHI() function throws an exception.

I thought that by having an additional parameter directly referencing a texture would force the graph builder to allow access to the RHI resource, but from the error message, it seems like that has the same problems.

My next idea is to try and bypass RDG resources altogether when trying to copy data to the buffer, and only use RDG textures for intermediate results of passes, though this seems like the incorrect method.

Sir, you are doing what Epic could not, useful documentation. And I thank you for that