Compute shader are slow, creating anamorphic DOF with custom screen pass


I continue with my crusade learning graphics in general and RGD, this time I am trying to implement a custom screen pass to do anamorphic bokehs in my DOF, (yes the diaphragm DOF is cool in the cinematic cameras, but I want anamorphic stuff).

For now all good, I am experimenting first with a single pass and I had very good results, I am able to deform the bokeh in X and Y directions:

Now the problem I am having is the performance and I am wondering what am I doing sooo bad, here is a screen:

As you can see from the ProfileGPU my pass is talking waaaay too much ms to be compute. If I disable all the logic in my compute shader, it is still taking 0.11 ms at that resolution, sampling the texture is way ultra heavy, why?, I compared the cinematic DOF and it takes 0.22 ms.

So obviously my initial idea here, is use half resolutions for my textures samples, but I am not yet sure what is the methodology, I have been reading the source from the diaphragm DOF, so I will try to implement a setup pass first. Anyway that does not explain why is taking 0.11 ms just by calling the empty compute shader. here is the code I am using to setup my pass:

FRDGTextureRef GatherDOF::AddGatherDOFPasses(
FRDGBuilder& GraphBuilder,
const FSceneTextureParameters& SceneTextures,
const FViewInfo& View,
const FRDGTextureRef& SceneColor,
const FRDGTextureRef& ViewFamilyTexture,
const FRDGTextureRef& SeparateTranslucency,
const FRDGTextureRef& SeparateModulation,
const FRDGTextureRef& CustomDepth)
FRDGTextureRef NewSceneColor;
FRDGTextureDesc Desc = SceneColor->Desc;
Desc.NumSamples = 1;
Desc.TargetableFlags |= TexCreate_UAV;
NewSceneColor = GraphBuilder.CreateTexture(Desc, TEXT("GatherDOFRecombine"));

// Setup the shader parameter used in all shaders.
FGatherDOFCommonShaderParameters CommonParameters;
CommonParameters.ViewUniformBuffer = View.ViewUniformBuffer;

FIntRect PassViewRect = View.ViewRect;

FGatherDOF::FParameters* PassParameters = GraphBuilder.AllocParameters<FGatherDOF::FParameters>();
PassParameters->CommonParameters = CommonParameters;
PassParameters->Radius = GGatherDOFParameters.Radius;
PassParameters->AnamorphicX = GGatherDOFParameters.AnamorphicX;
PassParameters->AnamorphicY = GGatherDOFParameters.AnamorphicY;
PassParameters->EdgeCount = GGatherDOFParameters.EdgeCount;
PassParameters->Rotation = GGatherDOFParameters.Rotation;
PassParameters->FocalDistance = GGatherDOFParameters.FocalDistance;
PassParameters->FocalRegion = GGatherDOFParameters.FocalRegion;
PassParameters->FocalLength = GGatherDOFParameters.FocalLength;
PassParameters->Aperture = GGatherDOFParameters.Apeture;

PassParameters->ViewportSize = FVector4(PassViewRect.Width(), PassViewRect.Height(), 1.0f / PassViewRect.Width(), 1.0f / PassViewRect.Height());
PassParameters->SceneColorTexture = SceneColor;
PassParameters->SceneDepthTexture = SceneTextures.SceneDepthBuffer;

PassParameters->SceneColorOutput = GraphBuilder.CreateUAV(NewSceneColor);

TShaderMapRef<FGatherDOF> ComputeShader(View.ShaderMap);
FComputeShaderUtils::GetGroupCount(View.ViewRect.Size(), kDefaultGroupSize));

return NewSceneColor;

Even tho’ I have been learning directX and the UE’s API for a while now, I have a ton of topics that I still don’t understand, here I am assuming my texture UAV is generated only once, since the diaphragm function is doing the same. I still don’t understand properly what is the tile pass, so if is there a ultra instinct powerful guy hidden in the shadows, who can explain a little bit about the passes used there, I will be extremely grateful with such person.

For those who are interested I am using this post as reference:…-post-effects/ might worth to mention that everything in there is old and deprecated since is not RGD.

Also I did a push request to epic games, where I did a small change to the post process files to be able to register custom global shaders from plugins as post process. So no one will need to modify the engine source if need to create a post process only with code. Let’s see if they accept it, here is the push request for those interested:

Again if you feel in terrible need to speak about compute shaders, do not hesitate


Hey ya!

A new day has come and news solutions have been found.

I converted all my stuff to use a vertex and pixel shader this time and to my enormous surprise now is working incredible fast, computing my 8*8 kernel takes .08 ms now, which is really cool since I am not even using half resolution for it. I read how to do that from the PostProcessBloomSetup.cpp, apparently they are using a compute shader and a pixel shader for mobile I guess, but my question remains the same, I was told the compute shaders are the fastest due the direct linkeage they have, but that does not seems correct, a pixel shader is way faster, it should be good to understand why is that.

Also I found some information about what the heck are tilling for a DOF effect, here for those interested:…%20background. apparently it is to process the circle of confusion in tiles over Z.

Well for now I will try now to create the half resolution setup and see how far I can have the quality


Hey, great work on that anamorphic DOF.

I’m looking for a solution exactly like that for a project and I have some questions for you. I was playing around with modifying the DOFBokehLUT.usf shader to get a squished bokeh and that worked sometimes. But I think in the DiaphragmDOF.cpp it sometimes just reverts to a circle for the bokeh kernel so I will need a different solution.
Were you able to implement your anamorphic bokeh into the diaphragmDOF?

Is there a way to get the old GatherDOF to work in 4.25 or is there any way you’d be willing to share your setup?

Hey you should tag me, to get epic messages, otherwise I am not aware.

I tried to do that first with the diaphragm DOF, but I could not, some of the parameters are all hard coded, I guess for optimizations, so it was easier to implement mine