Announcement

Collapse
No announcement yet.

Dynamic shadows artifacts

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Originally posted by spacegojira View Post
    Epic should have hired Kalle-H instead.
    Dude.... come on.

    Comment


      Originally posted by spacegojira View Post
      Epic should have hired Kalle-H instead.
      Why Epic doesn't respond anymore, summarized in one post.

      Comment


        Yeah, let's stay professional here. I'm very happy with Krzysztof's feedback on this issue, and I'm confident we'll get an update on this. We really don't need personal attacks.
        Helium Rain, a realistic space opera

        Comment


          Oh come on, I just praised Kalle-H, I didn't say Krzysztof is a bad person, don't be so sensitive. Didn't you read my post where I welcomed him to the community?

          But, if that was hurtful towards him, I apologize.
          Last edited by spacegojira; 02-21-2018, 05:19 AM.

          Comment


            Code:
                #if (!MODULATED_SHADOWS) || (FEATURE_LEVEL >= FEATURE_LEVEL_SM4 && !FORWARD_SHADING && !APPLY_TRANSLUCENCY_SHADOWS)
                    FGBufferData GBufferData = GetGBufferData(ScreenUV);
                #endif
                #if !MODULATED_SHADOWS
                    #if USE_PCSS
                        #if SPOT_LIGHT_PCSS
                            float Attenuation = GetLightInfluenceMask(WorldPosition) * saturate(dot(GBufferData.WorldNormal, DeferredLightUniforms.LightPosition - WorldPosition));
                        #else
                            float Attenuation = saturate(dot(GBufferData.WorldNormal, DeferredLightUniforms.NormalizedLightDirection));
                        #endif
                    #else
                        // Both spot and directional light use same shadowing code. Select proper direction. No need to normalize.
                        half3 Dir = DeferredLightUniforms.LightInvRadius > 0 ? (DeferredLightUniforms.LightPosition - WorldPosition) : DeferredLightUniforms.NormalizedLightDirection;
                        float Attenuation = GetLightInfluenceMask(WorldPosition)  * saturate(dot(GBufferData.WorldNormal, Dir));
                    #endif
                BRANCH
                if (Attenuation > 0)
                //Shadow sampling code.
            Just tested some early outs with shadow sampling. With 4 stationary spots lights these early outs saved about 5.5ms with my laptop. Can't really make pull request because it's not compatible with PCSS_SHARE_PER_PIXEL_QUAD used in ShadowPercentageCloserFiltering. Can't have divergent branching when using explicit derivates inside.

            Normal pointing towards light and pixel inside spotlight volume tests are equally benefical in my test scene.

            Normal pointing light test also combines ShadingModel unlit test with this PR. https://github.com/EpicGames/UnrealEngine/pull/4441 (unlit pixels do not need normal and it's defined as(0,0,0) )
            But without that optimization it's might be beneficial explicitly test if pixel shading model is Unlit.


            Also for some reason subsurface shadows are calculated for all subsurface models but not all them use subsurface shadows. I am not sure about others but I am sure that MATERIAL_SHADINGMODEL_SUBSURFACE_PROFILE is not using them. For cinematics these pixels might cover large sreeen area.

            Comment


              Doubled performance of PCF soft shadows by simply adding UNROLL attribute for loops. Reordered inner loop math. -7ALU per sample. Default use 32 sample so 224ALU total. https://github.com/EpicGames/UnrealE...ull/4508/files

              Comment


                Originally posted by Kalle_H View Post
                Doubled performance of PCF soft shadows by simply adding UNROLL attribute for loops. Reordered inner loop math. -7ALU per sample. Default use 32 sample so 224ALU total. https://github.com/EpicGames/UnrealE...ull/4508/files
                That is a surprise. What hardware did you test on ?

                Comment


                  Originally posted by Deathrey View Post

                  That is a surprise. What hardware did you test on ?
                  GeForce GTX 960M. It's not surprise that UNROLL is faster on that kind of loop. I have never encountered simple not nested loop that would be slower with unrolling. Sometimes benefits are not worth the additional code size but in this case it's quite clear win. I have to test this with my desktop GPU also when I get to office.

                  My directional light have default angle(1 degree) and I have tuned r.Shadow.MaxSoftKernelSize=18. When kernel size get's too big then cache misses start to dominate performance cost.

                  Comment


                    Originally posted by Kalle_H View Post

                    GeForce GTX 960M. It's not surprise that UNROLL is faster on that kind of loop. I have never encountered simple not nested loop that would be slower with unrolling. Sometimes benefits are not worth the additional code size but in this case it's quite clear win. I have to test this with my desktop GPU also when I get to office.

                    My directional light have default angle(1 degree) and I have tuned r.Shadow.MaxSoftKernelSize=18. When kernel size get's too big then cache misses start to dominate performance cost.
                    It is not the fact that unrolled loops are faster, that surprises. It is half the render time difference that is uncommon. I don't recall seeing a measurable gain outside of 20% in recent years, even with fetch-heavy loops. Not sure why the loop was not unrolled by compiler either, to be fair.

                    But yeah, considering that UE4 uses quality presets, that define loop iteration count at compile time, there is absolutely no reason not to unroll. As to code chunk, I think the inflation of the shader size and compile time is incomparable to speed gains in shadow filtering, so can't be even regarded as a downside.

                    Comment


                      There is diff between unroll and not.
                      https://www.diffchecker.com/43fcZ35Z
                      I use 32 samples for both search and pcf loops. Shader is quite big 1662assembly lines but performance is quite good. It's just 2.2ms slower than non soft shadows.
                      Last edited by Kalle_H; 02-23-2018, 08:08 AM.

                      Comment


                        Aye, definitely large improvement. Makes me want to take a look at loops in other shaders and re-check them.

                        On a side, unrelated note, why coarse derivatives are used? Maybe it is worth looking into fine ones, when available? Should give better biasing.

                        Comment


                          Originally posted by Deathrey View Post
                          Aye, definitely large improvement. Makes me want to take a look at loops in other shaders and re-check them.

                          On a side, unrelated note, why coarse derivatives are used? Maybe it is worth looking into fine ones, when available? Should give better biasing.
                          I have to check is there any visible different with ddx_fine. Is there performance difference and how much?

                          Comment


                            Originally posted by Kalle_H View Post

                            I have to check is there any visible different with ddx_fine. Is there performance difference and how much?
                            No idea to be fair. I'd expect it to be roughly 4x the cost of coarse. I doubt that would be large enough to be profilable.

                            Overall, there few moves, that I don't understand regarding PCSS in UE4. First being why the PCF bias used is only positive ? The technique itself kinda implies on it being both positive and negative. I agree that only positive bias safeguards you from some acne, but it also eats up the shadows, where they should be. I totally agree that clamping the bias at some point is a must, but should not be only positive.

                            Second is, why adaptive bias was used at all? Following the logic of conventional PCF implementation in UE4, it might be more consistent(not better, just more consistent) to follow the tradition and just use transition scale and flat bias.

                            Thirdly and lastly, why sobol random ? I might be biased here, but I was never able to pull a decent random out of it.

                            And as a general thought, what about using blocker search result to have reduced PCF sampling rate in umbra?

                            Comment


                              Originally posted by Deathrey View Post

                              No idea to be fair. I'd expect it to be roughly 4x the cost of coarse. I doubt that would be large enough to be profilable.

                              Overall, there few moves, that I don't understand regarding PCSS in UE4. First being why the PCF bias used is only positive ? The technique itself kinda implies on it being both positive and negative. I agree that only positive bias safeguards you from some acne, but it also eats up the shadows, where they should be. I totally agree that clamping the bias at some point is a must, but should not be only positive.

                              Second is, why adaptive bias was used at all? Following the logic of conventional PCF implementation in UE4, it might be more consistent(not better, just more consistent) to follow the tradition and just use transition scale and flat bias.

                              Thirdly and lastly, why sobol random ? I might be biased here, but I was never able to pull a decent random out of it.

                              And as a general thought, what about using blocker search result to have reduced PCF sampling rate in umbra?
                              I think most of the answers are that implementation is still WIP. I will try replace sobol with something else.
                              For reduced sampling rate I am not sure if it's worth it. There are already two early outs based on blocker search. If there are no blockers it skip all samples and there is no shadow. If all samples are blocked it will skip all the samples. Reduced sampling rate could be win without unrolling and variable loop counter.

                              Just found another optimization. Filter radius can be premultiplier to PCFUVMatrix. -2ALU per sample.
                              Code:
                              PCFUVMatrix = mul(float2x2(FilterRadius, 0, 0, FilterRadius), PCFUVMatrix);
                              I haven't even started with blocker search loop.

                              Comment


                                Just tested ddx_fine. Didn't have any visual difference for my test scene.
                                Last edited by Kalle_H; 02-23-2018, 10:46 AM.

                                Comment

                                Working...
                                X