Dynamic shadows artifacts

Deathrey · February 23, 2018, 12:21pm

Aye, definitely large improvement. Makes me want to take a look at loops in other shaders and re-check them.

On a side, unrelated note, why coarse derivatives are used? Maybe it is worth looking into fine ones, when available? Should give better biasing.

anonymous_user_fbe2d247 · February 23, 2018, 12:37pm

I have to check is there any visible different with ddx_fine. Is there performance difference and how much?

Deathrey · February 23, 2018, 1:27pm

No idea to be fair. I’d expect it to be roughly 4x the cost of coarse. I doubt that would be large enough to be profilable.

Overall, there few moves, that I don’t understand regarding PCSS in UE4. First being why the PCF bias used is only positive ? The technique itself kinda implies on it being both positive and negative. I agree that only positive bias safeguards you from some acne, but it also eats up the shadows, where they should be. I totally agree that clamping the bias at some point is a must, but should not be only positive.

Second is, why adaptive bias was used at all? Following the logic of conventional PCF implementation in UE4, it might be more consistent(not better, just more consistent) to follow the tradition and just use transition scale and flat bias.

Thirdly and lastly, why sobol random ? I might be biased here, but I was never able to pull a decent random out of it.

And as a general thought, what about using blocker search result to have reduced PCF sampling rate in ?

anonymous_user_fbe2d247 · February 23, 2018, 2:01pm

No idea to be fair. I’d expect it to be roughly 4x the cost of coarse. I doubt that would be large enough to be profilable.

Overall, there few moves, that I don’t understand regarding PCSS in UE4. First being why the PCF bias used is only positive ? The technique itself kinda implies on it being both positive and negative. I agree that only positive bias safeguards you from some acne, but it also eats up the shadows, where they should be. I totally agree that clamping the bias at some point is a must, but should not be only positive.

Second is, why adaptive bias was used at all? Following the logic of conventional PCF implementation in UE4, it might be more consistent(not better, just more consistent) to follow the tradition and just use transition scale and flat bias.

Thirdly and lastly, why sobol random ? I might be biased here, but I was never able to pull a decent random out of it.

And as a general thought, what about using blocker search result to have reduced PCF sampling rate in ?

I think most of the answers are that implementation is still WIP. I will try replace sobol with something else.
For reduced sampling rate I am not sure if it’s worth it. There are already two early outs based on blocker search. If there are no blockers it skip all samples and there is no shadow. If all samples are blocked it will skip all the samples. Reduced sampling rate could be win without unrolling and variable loop counter.

Just found another optimization. Filter radius can be premultiplier to PCFUVMatrix. -2ALU per sample.


PCFUVMatrix = mul(float2x2(FilterRadius, 0, 0, FilterRadius), PCFUVMatrix);

I haven’t even started with blocker search loop.

anonymous_user_fbe2d247 · February 23, 2018, 2:44pm

Just tested ddx_fine. Didn’t have any visual difference for my test scene.

anonymous_user_fbe2d247 · February 23, 2018, 3:17pm

I replaced Sobol with simple spiral sampling. With per pixel rotation matrix premultiplied to PCFUVMATRIX. It’s not as good looking yet but it’s saves 300 assembly lines. This makes me wonder is sobol random worth it. With unrolled loop it’s easy to precalculate sampling points. Same optimization can be done to blocker search too. Need more testing.



    PCFUVMatrix = mul(float2x2(FilterRadius, 0, 0, FilterRadius), PCFUVMatrix);
    float RandAngle = 2.0f * PI * frac(7.1721f * Settings.SvPosition.x + 11.131f * Settings.SvPosition.y + View.StateFrameIndexMod8 * 0.125f);
    PCFUVMatrix = mul(float2x2(cos(RandAngle), -sin(RandAngle), sin(RandAngle), cos(RandAngle)), PCFUVMatrix);
    UNROLL
    for (int j = 0; j < PCSS_SAMPLES; j++)
    {
        float angle = j * PI * 4.71f;
        float2 PCFSample = float2(sin(angle), cos(angle)) * sqrt(float(j+1) * (1.0f / float(PCSS_SAMPLES)));
        //float2 PCFSample = RandToCircle(SobolIndex(SobolRandom, j << 3, PCSS_SAMPLE_BITS + 3));
        float2 SampleUVOffset = mul(PCFUVMatrix, PCFSample);
        float2 SampleUV = ShadowPosition + SampleUVOffset * Settings.ShadowTileOffsetAndSize.zw;

        float SampleDepthBias = max(dot(DepthBiasDotFactors, SampleUVOffset), 0);        

        #if FEATURE_GATHER4
            float4 SampleDepth = Settings.ShadowDepthTexture.Gather(Settings.ShadowDepthTextureSampler, SampleUV);
            VisibleLightAccumulation += dot(0.25, saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth)));
        #else
            float SampleDepth = Texture2DSampleLevel(Settings.ShadowDepthTexture, Settings.ShadowDepthTextureSampler, SampleUV, 0).r;
            VisibleLightAccumulation += saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth));
        #endif
    }

anonymous_user_fbe2d247 · February 26, 2018, 10:07am

Ditching Sobol reduced ShadowProjection time from 4.04ms to 2.60ms.

Raildex · February 26, 2018, 10:54am

when will this be in an official build?

DamirH · February 26, 2018, 11:31am

Are the visual improvements of Sobol worh the ~1.44ms?

anonymous_user_fbe2d247 · February 26, 2018, 1:06pm

Very unlikely. Spiral in some cases gives bit nicer self shadow. Both are quite good methods to do sampling. Sobol just isn’t free.

anonymous_user_fbe2d247 · February 26, 2018, 1:22pm



    // PCF loop.
    float VisibleLightAccumulation = 0;
    float ScaledAndBiasedDepth = -Settings.SceneDepth * Settings.TransitionScale + 1.f;

    float RandAngle = 2.0f * PI * frac(0.625 * Settings.SvPosition.x + 0.625 * Settings.SvPosition.y + View.StateFrameIndexMod8 * 0.625f);
    float radius = View.StateFrameIndexMod8 % 2 == 0 ? FilterRadius : -FilterRadius;
    PCFUVMatrix = mul(mul(float2x2(radius, 0, 0, radius), float2x2(cos(RandAngle), -sin(RandAngle), sin(RandAngle), cos(RandAngle))), PCFUVMatrix);
    UNROLL
    for (int j = 0; j < PCSS_SAMPLES; j++)
    {
        float angle = j * PI * 2.0f * 0.4171;
        float2 PCFSample = float2(sin(angle), cos(angle)) * sqrt(float(j) * (1.0f / float(PCSS_SAMPLES)));
        //float2 PCFSample = RandToCircle(SobolIndex(SobolRandom, j << 3, PCSS_SAMPLE_BITS + 3));
        float2 SampleUVOffset = mul(PCFUVMatrix, PCFSample);
        float2 SampleUV = ShadowPosition + SampleUVOffset * Settings.ShadowTileOffsetAndSize.zw;

        float SampleDepthBias = max(dot(DepthBiasDotFactors, SampleUVOffset), 0);        

        #if FEATURE_GATHER4
            float4 SampleDepth = Settings.ShadowDepthTexture.Gather(Settings.ShadowDepthTextureSampler, SampleUV);
            VisibleLightAccumulation += dot(0.25, saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth)));
        #else
            float SampleDepth = Texture2DSampleLevel(Settings.ShadowDepthTexture, Settings.ShadowDepthTextureSampler, SampleUV, 0).r;
            VisibleLightAccumulation += saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth));
        #endif
    }

    float Visibility = VisibleLightAccumulation * (1.0f / float(PCSS_SAMPLES));

This is the final form on spiral tapping. Every pixel is rotated peusdo randomly and samples are mirrored every other frame. Spiral offsets are precalculated and then rotated, scaled and mirrored with 2x2 matrix. So this method has no per sample overhead at all.

ZacD · February 26, 2018, 1:34pm

Would love to see some comparisons.

anonymous_user_fbe2d247 · February 26, 2018, 2:06pm

Here you go.

Sobol:

Spiral:

Zeblote · February 26, 2018, 4:39pm

Looks nearly identical! No player will know how it was before…

DamirH · February 26, 2018, 10:05pm

Indeed, if you didn’t have the images side by side you wouldn’t be able to tell the difference. Is the difference more obvious in other scenarios maybe?

anonymous_user_c94f194d · February 27, 2018, 9:37am

Finally back at office. Profiled scene before and after softshadow optimizations with GTX 1080Ti. Single directional light with two cascades ShadowProjection time went from 1.51ms to 0.47ms. Non soft shadows cost would be 0.18ms.

BrUnO_XaVIeR · February 27, 2018, 11:16am

I just hope they implement when you PR…
Some times it’s annoying the long wait when someone PR changes to renderer just to be said “no”.

NilsonLima · February 27, 2018, 11:24am

Yes, this one as an example: https://forums.unrealengine.com/development-discussion/rendering/125798-temporal-aa-sharpening

DamirH · February 27, 2018, 11:30am

Well I think this is a bigger and more advantageous change than TAA sharpening.

ZacD · February 27, 2018, 11:35am

The PR for TAA sharpening kinda conflicts with Epic’s perspective and intentions with TAA, Plus Epic was working on Temporal AA upsampling and dynamic resolution.