Announcement

Collapse
No announcement yet.

Dynamic shadows artifacts

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • replied
    Originally posted by DamirH View Post

    Are the visual improvements of Sobol worh the ~1.44ms?
    Very unlikely. Spiral in some cases gives bit nicer self shadow. Both are quite good methods to do sampling. Sobol just isn't free.

    Leave a comment:


  • replied
    Originally posted by Kalle_H View Post
    Ditching Sobol reduced ShadowProjection time from 4.04ms to 2.60ms.
    Are the visual improvements of Sobol worh the ~1.44ms?

    Leave a comment:


  • replied
    when will this be in an official build?

    Leave a comment:


  • replied
    Ditching Sobol reduced ShadowProjection time from 4.04ms to 2.60ms.

    Leave a comment:


  • replied
    I replaced Sobol with simple spiral sampling. With per pixel rotation matrix premultiplied to PCFUVMATRIX. It's not as good looking yet but it's saves 300 assembly lines. This makes me wonder is sobol random worth it. With unrolled loop it's easy to precalculate sampling points. Same optimization can be done to blocker search too. Need more testing.
    Code:
        PCFUVMatrix = mul(float2x2(FilterRadius, 0, 0, FilterRadius), PCFUVMatrix);
        float RandAngle = 2.0f * PI * frac(7.1721f * Settings.SvPosition.x + 11.131f * Settings.SvPosition.y + View.StateFrameIndexMod8 * 0.125f);
        PCFUVMatrix = mul(float2x2(cos(RandAngle), -sin(RandAngle), sin(RandAngle), cos(RandAngle)), PCFUVMatrix);
        UNROLL
        for (int j = 0; j < PCSS_SAMPLES; j++)
        {
            float angle = j * PI * 4.71f;
            float2 PCFSample = float2(sin(angle), cos(angle)) * sqrt(float(j+1) * (1.0f / float(PCSS_SAMPLES)));
            //float2 PCFSample = RandToCircle(SobolIndex(SobolRandom, j << 3, PCSS_SAMPLE_BITS + 3));
            float2 SampleUVOffset = mul(PCFUVMatrix, PCFSample);
            float2 SampleUV = ShadowPosition + SampleUVOffset * Settings.ShadowTileOffsetAndSize.zw;
    
            float SampleDepthBias = max(dot(DepthBiasDotFactors, SampleUVOffset), 0);        
    
            #if FEATURE_GATHER4
                float4 SampleDepth = Settings.ShadowDepthTexture.Gather(Settings.ShadowDepthTextureSampler, SampleUV);
                VisibleLightAccumulation += dot(0.25, saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth)));
            #else
                float SampleDepth = Texture2DSampleLevel(Settings.ShadowDepthTexture, Settings.ShadowDepthTextureSampler, SampleUV, 0).r;
                VisibleLightAccumulation += saturate(SampleDepth * Settings.TransitionScale + (Settings.TransitionScale * SampleDepthBias + ScaledAndBiasedDepth));
            #endif
        }

    Leave a comment:


  • replied
    Just tested ddx_fine. Didn't have any visual difference for my test scene.
    Last edited by Kalle_H; 02-23-2018, 10:46 AM.

    Leave a comment:


  • replied
    Originally posted by Deathrey View Post

    No idea to be fair. I'd expect it to be roughly 4x the cost of coarse. I doubt that would be large enough to be profilable.

    Overall, there few moves, that I don't understand regarding PCSS in UE4. First being why the PCF bias used is only positive ? The technique itself kinda implies on it being both positive and negative. I agree that only positive bias safeguards you from some acne, but it also eats up the shadows, where they should be. I totally agree that clamping the bias at some point is a must, but should not be only positive.

    Second is, why adaptive bias was used at all? Following the logic of conventional PCF implementation in UE4, it might be more consistent(not better, just more consistent) to follow the tradition and just use transition scale and flat bias.

    Thirdly and lastly, why sobol random ? I might be biased here, but I was never able to pull a decent random out of it.

    And as a general thought, what about using blocker search result to have reduced PCF sampling rate in umbra?
    I think most of the answers are that implementation is still WIP. I will try replace sobol with something else.
    For reduced sampling rate I am not sure if it's worth it. There are already two early outs based on blocker search. If there are no blockers it skip all samples and there is no shadow. If all samples are blocked it will skip all the samples. Reduced sampling rate could be win without unrolling and variable loop counter.

    Just found another optimization. Filter radius can be premultiplier to PCFUVMatrix. -2ALU per sample.
    Code:
    PCFUVMatrix = mul(float2x2(FilterRadius, 0, 0, FilterRadius), PCFUVMatrix);
    I haven't even started with blocker search loop.

    Leave a comment:


  • replied
    Originally posted by Kalle_H View Post

    I have to check is there any visible different with ddx_fine. Is there performance difference and how much?
    No idea to be fair. I'd expect it to be roughly 4x the cost of coarse. I doubt that would be large enough to be profilable.

    Overall, there few moves, that I don't understand regarding PCSS in UE4. First being why the PCF bias used is only positive ? The technique itself kinda implies on it being both positive and negative. I agree that only positive bias safeguards you from some acne, but it also eats up the shadows, where they should be. I totally agree that clamping the bias at some point is a must, but should not be only positive.

    Second is, why adaptive bias was used at all? Following the logic of conventional PCF implementation in UE4, it might be more consistent(not better, just more consistent) to follow the tradition and just use transition scale and flat bias.

    Thirdly and lastly, why sobol random ? I might be biased here, but I was never able to pull a decent random out of it.

    And as a general thought, what about using blocker search result to have reduced PCF sampling rate in umbra?

    Leave a comment:


  • replied
    Originally posted by Deathrey View Post
    Aye, definitely large improvement. Makes me want to take a look at loops in other shaders and re-check them.

    On a side, unrelated note, why coarse derivatives are used? Maybe it is worth looking into fine ones, when available? Should give better biasing.
    I have to check is there any visible different with ddx_fine. Is there performance difference and how much?

    Leave a comment:


  • replied
    Aye, definitely large improvement. Makes me want to take a look at loops in other shaders and re-check them.

    On a side, unrelated note, why coarse derivatives are used? Maybe it is worth looking into fine ones, when available? Should give better biasing.

    Leave a comment:


  • replied
    There is diff between unroll and not.
    https://www.diffchecker.com/43fcZ35Z
    I use 32 samples for both search and pcf loops. Shader is quite big 1662assembly lines but performance is quite good. It's just 2.2ms slower than non soft shadows.
    Last edited by Kalle_H; 02-23-2018, 08:08 AM.

    Leave a comment:


  • replied
    Originally posted by Kalle_H View Post

    GeForce GTX 960M. It's not surprise that UNROLL is faster on that kind of loop. I have never encountered simple not nested loop that would be slower with unrolling. Sometimes benefits are not worth the additional code size but in this case it's quite clear win. I have to test this with my desktop GPU also when I get to office.

    My directional light have default angle(1 degree) and I have tuned r.Shadow.MaxSoftKernelSize=18. When kernel size get's too big then cache misses start to dominate performance cost.
    It is not the fact that unrolled loops are faster, that surprises. It is half the render time difference that is uncommon. I don't recall seeing a measurable gain outside of 20% in recent years, even with fetch-heavy loops. Not sure why the loop was not unrolled by compiler either, to be fair.

    But yeah, considering that UE4 uses quality presets, that define loop iteration count at compile time, there is absolutely no reason not to unroll. As to code chunk, I think the inflation of the shader size and compile time is incomparable to speed gains in shadow filtering, so can't be even regarded as a downside.

    Leave a comment:


  • replied
    Originally posted by Deathrey View Post

    That is a surprise. What hardware did you test on ?
    GeForce GTX 960M. It's not surprise that UNROLL is faster on that kind of loop. I have never encountered simple not nested loop that would be slower with unrolling. Sometimes benefits are not worth the additional code size but in this case it's quite clear win. I have to test this with my desktop GPU also when I get to office.

    My directional light have default angle(1 degree) and I have tuned r.Shadow.MaxSoftKernelSize=18. When kernel size get's too big then cache misses start to dominate performance cost.

    Leave a comment:


  • replied
    Originally posted by Kalle_H View Post
    Doubled performance of PCF soft shadows by simply adding UNROLL attribute for loops. Reordered inner loop math. -7ALU per sample. Default use 32 sample so 224ALU total. https://github.com/EpicGames/UnrealE...ull/4508/files
    That is a surprise. What hardware did you test on ?

    Leave a comment:


  • replied
    Doubled performance of PCF soft shadows by simply adding UNROLL attribute for loops. Reordered inner loop math. -7ALU per sample. Default use 32 sample so 224ALU total. https://github.com/EpicGames/UnrealE...ull/4508/files

    Leave a comment:

Working...
X