Dynamic shadows artifacts

Aye, definitely large improvement. Makes me want to take a look at loops in other shaders and re-check them.

On a side, unrelated note, why coarse derivatives are used? Maybe it is worth looking into fine ones, when available? Should give better biasing.

I have to check is there any visible different with ddx_fine. Is there performance difference and how much?

No idea to be fair. I’d expect it to be roughly 4x the cost of coarse. I doubt that would be large enough to be profilable.

Overall, there few moves, that I don’t understand regarding PCSS in UE4. First being why the PCF bias used is only positive ? The technique itself kinda implies on it being both positive and negative. I agree that only positive bias safeguards you from some acne, but it also eats up the shadows, where they should be. I totally agree that clamping the bias at some point is a must, but should not be only positive.

Second is, why adaptive bias was used at all? Following the logic of conventional PCF implementation in UE4, it might be more consistent(not better, just more consistent) to follow the tradition and just use transition scale and flat bias.

Thirdly and lastly, why sobol random ? I might be biased here, but I was never able to pull a decent random out of it.

And as a general thought, what about using blocker search result to have reduced PCF sampling rate in ?

I think most of the answers are that implementation is still WIP. I will try replace sobol with something else.
For reduced sampling rate I am not sure if it’s worth it. There are already two early outs based on blocker search. If there are no blockers it skip all samples and there is no shadow. If all samples are blocked it will skip all the samples. Reduced sampling rate could be win without unrolling and variable loop counter.

Just found another optimization. Filter radius can be premultiplier to PCFUVMatrix. -2ALU per sample.


PCFUVMatrix = mul(float2x2(FilterRadius, 0, 0, FilterRadius), PCFUVMatrix);

I haven’t even started with blocker search loop.

Just tested ddx_fine. Didn’t have any visual difference for my test scene.

I replaced Sobol with simple spiral sampling. With per pixel rotation matrix premultiplied to PCFUVMATRIX. It’s not as good looking yet but it’s saves 300 assembly lines. This makes me wonder is sobol random worth it. With unrolled loop it’s easy to precalculate sampling points. Same optimization can be done to blocker search too. Need more testing.



    PCFUVMatrix = mul(float2x2(FilterRadius, 0, 0, FilterRadius), PCFUVMatrix);
    float RandAngle = 2.0f * PI * frac(7.1721f * Settings.SvPosition.x + 11.131f * Settings.SvPosition.y + View.StateFrameIndexMod8 * 0.125f);
    PCFUVMatrix = mul(float2x2(cos(RandAngle), -sin(RandAngle), sin(RandAngle), cos(RandAngle)), PCFUVMatrix);
    UNROLL
    for (int j = 0; j < PCSS_SAMPLES; j++)
    {
        float angle = j * PI * 4.71f;
        float2 PCFSample = float2(sin(angle), cos(angle)) * sqrt(float(j+1) * (1.0f / float(PCSS_SAMPLES)));
        //float2 PCFSample = RandToCircle(SobolIndex(SobolRandom, j << 3, PCSS_SAMPLE_BITS + 3));
        float2 SampleUVOffset = mul(PCFUVMatrix, PCFSample);
        float2 SampleUV = ShadowPosition + SampleUVOffset * Settings.ShadowTileOffsetAndSize.zw;

        float SampleDepthBias = max(dot(DepthBiasDotFactors, SampleUVOffset), 0);        

        #if FEATURE_GATHER4
            float4 SampleDepth = Settings.ShadowDepthTexture.Gather(Settings.ShadowDepthTextureSampler, SampleUV);
            VisibleLightAccumulation += dot(0.25, saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth)));
        #else
            float SampleDepth = Texture2DSampleLevel(Settings.ShadowDepthTexture, Settings.ShadowDepthTextureSampler, SampleUV, 0).r;
            VisibleLightAccumulation += saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth));
        #endif
    }


Ditching Sobol reduced ShadowProjection time from 4.04ms to 2.60ms.

when will this be in an official build? :smiley:

Are the visual improvements of Sobol worh the ~1.44ms?

Very unlikely. Spiral in some cases gives bit nicer self shadow. Both are quite good methods to do sampling. Sobol just isn’t free.



    // PCF loop.
    float VisibleLightAccumulation = 0;
    float ScaledAndBiasedDepth = -Settings.SceneDepth * Settings.TransitionScale + 1.f;

    float RandAngle = 2.0f * PI * frac(0.625 * Settings.SvPosition.x + 0.625 * Settings.SvPosition.y + View.StateFrameIndexMod8 * 0.625f);
    float radius = View.StateFrameIndexMod8 % 2 == 0 ? FilterRadius : -FilterRadius;
    PCFUVMatrix = mul(mul(float2x2(radius, 0, 0, radius), float2x2(cos(RandAngle), -sin(RandAngle), sin(RandAngle), cos(RandAngle))), PCFUVMatrix);
    UNROLL
    for (int j = 0; j < PCSS_SAMPLES; j++)
    {
        float angle = j * PI * 2.0f * 0.4171;
        float2 PCFSample = float2(sin(angle), cos(angle)) * sqrt(float(j) * (1.0f / float(PCSS_SAMPLES)));
        //float2 PCFSample = RandToCircle(SobolIndex(SobolRandom, j << 3, PCSS_SAMPLE_BITS + 3));
        float2 SampleUVOffset = mul(PCFUVMatrix, PCFSample);
        float2 SampleUV = ShadowPosition + SampleUVOffset * Settings.ShadowTileOffsetAndSize.zw;

        float SampleDepthBias = max(dot(DepthBiasDotFactors, SampleUVOffset), 0);        

        #if FEATURE_GATHER4
            float4 SampleDepth = Settings.ShadowDepthTexture.Gather(Settings.ShadowDepthTextureSampler, SampleUV);
            VisibleLightAccumulation += dot(0.25, saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth)));
        #else
            float SampleDepth = Texture2DSampleLevel(Settings.ShadowDepthTexture, Settings.ShadowDepthTextureSampler, SampleUV, 0).r;
            VisibleLightAccumulation += saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth));
        #endif
    }

    float Visibility = VisibleLightAccumulation * (1.0f / float(PCSS_SAMPLES));


This is the final form on spiral tapping. Every pixel is rotated peusdo randomly and samples are mirrored every other frame. Spiral offsets are precalculated and then rotated, scaled and mirrored with 2x2 matrix. So this method has no per sample overhead at all.

Would love to see some comparisons.

Here you go.

Sobol:


Spiral:

​​​​​​​

Looks nearly identical! No player will know how it was before…

Indeed, if you didn’t have the images side by side you wouldn’t be able to tell the difference. Is the difference more obvious in other scenarios maybe?

Finally back at office. Profiled scene before and after softshadow optimizations with GTX 1080Ti. Single directional light with two cascades ShadowProjection time went from 1.51ms to 0.47ms. Non soft shadows cost would be 0.18ms.

I just hope they implement when you PR…
Some times it’s annoying the long wait when someone PR changes to renderer just to be said “no”.

Yes, this one as an example: https://forums.unrealengine.com/development-discussion/rendering/125798-temporal-aa-sharpening

Well I think this is a bigger and more advantageous change than TAA sharpening.

The PR for TAA sharpening kinda conflicts with Epic’s perspective and intentions with TAA, Plus Epic was working on Temporal AA upsampling and dynamic resolution.