Hi friends,
My team is using a compute shader in UE4.27 to render sprites in our game’s character editor, allowing users to rotate and transform the sprites in real time without needing to pass the textures from the CPU to the GPU continuously.
This method has worked on all DX11-enabled Windows hardware that we’ve tested, except for recent Nividia graphics cards. Users with the following graphics cards have reported a “striping” issue where pixels are missing from the rendered sprite (as if the compute shader didn’t finish running): RTX 3080, 3090, and 4080)
The compute shader is written in HLSL, and we used this unofficial guide as a starting point to get things up and running (there’s very little official documentation about compute shaders in UE4):
We’ve disabled DX12 in the UE4 settings to make sure that all PCs are using DX11 when the compute shader is run.
I was wondering if recent Nividia graphics cards have discontinued support for specific HLSL functions. These are some example functions we use that might work differently across different hardware, but I didn’t find any information about this when I searched online:
- InterlockedAdd
- InterlockedMin
We also use structured buffers containing structs, and we’ve been careful to follow Nividia’s recommendations when defining those structs (padding them so that their size is divisible by 128 bits, etc, although that is probably only relevant for performance). Example from our .usf compute shader file:
//Struct containing data used to render sprites
struct Attributes
{
//Nvidia recommends aiming for structs divisible by 128 bits (the size of a float4)
// Reference: https://developer.nvidia.com/content/understanding-structured-buffer-performance
//128 bits
uint variable1;
int variable2;
uint variable3;
int variable4;
//128 bits
int variable5;
float variable6;
uint variable7;
uint variable8;
};
// Input buffer that can be updated each frame while the compute shader is running
StructuredBuffer<Attributes> AttributesBuffer;
On the C++ side, the structured buffer is populated using an SRV like this:
TResourceArray<FAttributes> attributesResourceArray;
//Initialize resource array (checks the number of items is our local cached parameters)
numEntries = FMath::Max(cachedVariableParams.AttributesBuffer.Num(), 1); //We MUST have at least one entry in the SRV
attributesResourceArray.Init(EmptyAttributes, numEntries);
//Copy data from the local cached parameters into the resource array
for (int32 i = 0; i < cachedVariableParams.AttributesBuffer.Num(); i++)
{
attributesResourceArray[i] = cachedVariableParams.AttributesBuffer[i];
}
//Create SRV (will be used when dispatching the compute shader below)
const FShaderResourceViewRHIRef AttributesBufferSRV = CreateSRV(sizeof(FAttributes), sizeof(FAttributes) * numEntries, &attributesResourceArray);
And then we dispatch a compute shader pass like this (this is one of multiple passes that the compute shader performs):
//Fill the shader parameters structure with the cached data supplied by the client
FPass1ComputeShaderDeclaration::FParameters Pass1_Params;
Pass1_Params.StaticSpriteSheet = cachedConstantParams.StaticSpriteSheet1->Resource->TextureRHI; //Also setting a texture parameter
Pass1_Params.StaticSpriteAttributesBuffer = AttributesBufferSRV; //Pass our SRV to the compute shader
//Get a reference to our shader type from global shader map
TShaderMapRef<FPass1ComputeShaderDeclaration> Pass1(GetGlobalShaderMap(GMaxRHIFeatureLevel));
//Dispatch the compute shader
//Looping over number of static sprite (each static sprite is 256x256 pixels = 65536)
FComputeShaderUtils::Dispatch(RHICmdList, Pass1, Pass1_Params, FIntVector(NumStaticSprites * 65536 / NUM_THREADS_PER_GROUP_DIMENSION_X, 1, 1));
Graphics experts: any idea what’s going wrong with our setup on the newer Nividia cards? Have you seen anything similar, or do you have ideas for troubleshooting the issue?
Much appreciated!