Announcement

Collapse
No announcement yet.

Reduce bloom quality to a single gaussian blur?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Reduce bloom quality to a single gaussian blur?

    The doc about bloom says:

    Bloom can be implemented with a single Gaussian blur. For better quality, we combine multiple Gaussian blurs with different radius. For better performance, we do the very wide blurs in much lower resolution. In UE3, we had 3 Gaussian blurs in the resolution 1/4, 1/8, and 1/16. We now have multiple blurs name Blur1 to 5 in the resolution 1/2 (Blur1) to 1/32 (Blur5). We might add Blur 0 for full resolution blur if ever needed.
    https://docs.unrealengine.com/latest...Effects/Bloom/

    I would like to reduce the amount of Gaussian blurs to 1 to save performance. How to do that? I did not find any settings that affect the performance of the bloom. All settings in the PP volume are purely cosmetically it seems with no impact on performance (apart from setting Intensity to 0). Then there is r.bloomquality, 0 disables bloom and 1 is default. And r.bloom.cross, default is 0 and 1 makes bloom ugly, but performance is not affected.

    So I think there should be some way to reduce the amount of blurs that are done to render the bloom to improve performance, but how?

    The reason I'm asking is that in VR, bloom is definitely very expensive, and not having any bloom also is not really a solution.

    Profiling with bloom enabled:



    Profiling with bloom disabled:



    So bloom is taking 0.62 ms currently, that's too much in VR. I'm testing with 4.14 now and unfortunately bloom performance has not improved in recent engine versions.
    Last edited by John Alcatraz; 10-21-2016, 11:38 AM.
    Easy to use UMG Mini Map on the UE4 Marketplace.
    Forum thread: https://forums.unrealengine.com/show...-Plug-and-Play

    #2
    +1, I'm quite interested in this too
    Follow me on Twitter!
    Developer of Elium - Prison Escape
    Local Image-Based Lighting for UE4

    Comment


      #3
      Originally posted by John Alcatraz View Post
      there is r.bloomquality, 0 disables bloom and 1 is default
      According to the source the quality 1 should result in 3 stages only, and on quality 5 you will get all the 6 you are showing on the profile screenshots.

      Originally posted by John Alcatraz View Post
      bloom is taking 0.62 ms currently, that's too much in VR
      These are hardcoded values unfortunately, which will require the engine code to be modified for better optimization on this area, but the improvement sounds reasonable.

      There however is the possible issue with the result of just 1 pass will not present you the expected quality. A very clever reimplementation of the similar bloom idea have also ended up using multiple passes (it's 5 if i'm counting it right) to get the nice bloom effect, but this obviously should not keep you away from experimenting with it.
      * Sharp and responsive Temporal Anti-Aliasing tips and tricks
      * Pitch-shift source effect (DSP) over the network (VOIP)
      * My Portfolio and Developer Blog

      Comment


        #4
        Originally posted by Konflict View Post
        According to the source the quality 1 should result in 3 stages only, and on quality 5 you will get all the 6 you are showing on the profile screenshots.

        These are hardcoded values unfortunately, which will require the engine code to be modified for better optimization on this area, but the improvement sounds reasonable.
        Ah, thanks very much for linking that source file! That number there can easily be modified in the source, I have no problem with modifying source, I'm fine with that.



        So I did that and compiled. I definitely see the quality difference between r.BloomQuality 1 and r.BloomQuality 2 now, the quality with 1 stage is still relatively acceptable. It's blocky, but its still way better than no bloom at all, so thats what I wanted.

        Unfortunately, it didn't help with performance at all though. Changing the amount of stages to 1 only affects the "PostProcessWeightedSampleSum" points in the profiler, and those are very cheap, so I saw no relevant performance difference between 1 stage and 3 stages. That's probably why Epic didn't make it possible to set it to 1 from the editor, it just makes no sense since 1 is same expensive like 3. (@Konflict for every blur stage there are 2 "PostProcessWeightedSampleSum" in the profiler, so in the screenshots of my first post the amount of stages was 3, not 6).

        But why are the points that are responsible for the majority of the blooms cost not affected by this? The most relevant things are:

        "Downsample 756x840",
        "PostProcessBloomSetup 756x840"
        "Downsample 378x420"

        Those are responsible for the majority of the blooms cost and not affected at all by changing the amount of stages it seems. Also, the tonemapper seems to be quite a bit more expensive when bloom is enabled.

        756x840 is half of the per-eye resolution, so that number seems to be hardcoded to screen size * 0.5, why? Would it be possible to change that to 0.25 or 0.125 of the screen resolution? That would make a much bigger difference to performance it seems than changing the amount of blur stages.

        Where could I change that number in the source? I looked around the source in AddBloom() and the functions below that are called, but I couldn't find it there. Any ideas?
        Last edited by John Alcatraz; 10-25-2016, 10:53 PM.
        Easy to use UMG Mini Map on the UE4 Marketplace.
        Forum thread: https://forums.unrealengine.com/show...-Plug-and-Play

        Comment


          #5
          Originally posted by John Alcatraz View Post
          affects the "PostProcessWeightedSampleSum" points in the profiler
          Then i read it wrong, and that's the sampling kernel's size, where 3 is fine to get a 3x3 kernel to work with, but doing only 1x1 means you get the same image since you only sample the center. So the 3 should be fine.

          Originally posted by John Alcatraz View Post
          it just makes no sense since 1 is same expensive like 3.
          Yes it takes a little longer to read all this stuff, but you either get a 3x3 kernel as a minimum regardless you requested 1 (minimum lock) or it could be the way the GPU is sampling the pixels, and as they coming from cached values (instead of measuring the same pixel multiple times) the performance costs also gets significantly reduced.

          Originally posted by John Alcatraz View Post
          was 3, not 6).
          Makes sense, it must be two separated blur passes then.

          Originally posted by John Alcatraz View Post
          the tonemapper more expensive
          Probably that's where the blending happens, and it takes some time to do it multiple times.

          Originally posted by John Alcatraz View Post
          screen size * 0.5, why?
          Maybe this? A third method could just be .25?

          Originally posted by John Alcatraz View Post
          much bigger difference
          Yes indeed, since it would reduce the tonemappers work as well.
          * Sharp and responsive Temporal Anti-Aliasing tips and tricks
          * Pitch-shift source effect (DSP) over the network (VOIP)
          * My Portfolio and Developer Blog

          Comment


            #6
            Originally posted by Konflict View Post
            Maybe this? A third method could just be .25?
            I think that PostProcessDownsample is a general thing for all PP effects and not related to bloom, right? Only the bloom should get calculated on a lower resolution.
            Easy to use UMG Mini Map on the UE4 Marketplace.
            Forum thread: https://forums.unrealengine.com/show...-Plug-and-Play

            Comment


              #7
              Originally posted by John Alcatraz View Post
              not related to bloom
              It should be, but it might not the right spot. How about this and the next line? Both seem to have effect on the rendertarget size to reach lower resolution. But as you can see there's no simple way to just set the scale here, it is pretty much designed to always generate halved sizes. It's a logical way, tho not optimal as per VR requirements. I'd try to extend this class to add more configurable options, so the bloom could request the very small maps optionally. Size .25 or .33 should be fine.

              It also worth to point out, that the comments in the bloom will mention that eyeadaptation is also use these maps to measure the lumen, so i'd advise to check the function regularly to see if it's still working as it should.

              Edit:
              Try r.UseMobileBloom 1, that's a different bloom and might have a lesser cost. But as the name suggests, it was not designed for desktops.
              Last edited by Konflict; 10-26-2016, 01:10 PM.
              * Sharp and responsive Temporal Anti-Aliasing tips and tricks
              * Pitch-shift source effect (DSP) over the network (VOIP)
              * My Portfolio and Developer Blog

              Comment


                #8
                Originally posted by Konflict View Post
                It should be, but it might not the right spot. How about this and the next line? Both seem to have effect on the rendertarget size to reach lower resolution. But as you can see there's no simple way to just set the scale here, it is pretty much designed to always generate halved sizes. It's a logical way, tho not optimal as per VR requirements. I'd try to extend this class to add more configurable options, so the bloom could request the very small maps optionally. Size .25 or .33 should be fine.
                Thanks, I've tried to divide by 4 there instead of 2, but unofurtutanely that makes the bloom appear only in a quarter of the screen, so the lower res thing is not scaled to the full screen but just shown on the top left. So there are probably some more places where something would need to get changed to make it work...

                Originally posted by Konflict View Post
                It also worth to point out, that the comments in the bloom will mention that eyeadaptation is also use these maps to measure the lumen, so i'd advise to check the function regularly to see if it's still working as it should.
                Eyeadaption is not realy making much sense in VR, so that would not be an issue.

                Originally posted by Konflict View Post
                Edit:
                Try r.UseMobileBloom 1, that's a different bloom and might have a lesser cost. But as the name suggests, it was not designed for desktops.
                Thats very interesting, thanks! I didn't know about that console variable. It works on desktop, but unfortunately the cost seems to be same or even higher than the regular bloom...
                Easy to use UMG Mini Map on the UE4 Marketplace.
                Forum thread: https://forums.unrealengine.com/show...-Plug-and-Play

                Comment


                  #9
                  Originally posted by John Alcatraz View Post
                  divide by 4 there instead of 2, appear only in a quarter of the screen
                  The downsample pass should be followed by an upsample pass which will revert the scaling to a higher dimension. The upscaling pass therefore must be changed as well before the blending happens. But as you can see, this part of the code is very rigid, and designed to always trigger all downsample and upsample passes.
                  * Sharp and responsive Temporal Anti-Aliasing tips and tricks
                  * Pitch-shift source effect (DSP) over the network (VOIP)
                  * My Portfolio and Developer Blog

                  Comment


                    #10
                    Originally posted by Konflict View Post
                    The downsample pass should be followed by an upsample pass which will revert the scaling to a higher dimension. The upscaling pass therefore must be changed as well before the blending happens. But as you can see, this part of the code is very rigid, and designed to always trigger all downsample and upsample passes.
                    Thanks, I tried multiplying the UpScale variable by 2, but that didn't change anything unfortunately. I don't believe that with this kind of trial and error approach I will get to where I want to get...

                    I wish someone from Epic who knows how that Bloom stuff works could just quickly comment here and tell us what the easiest way to modify this would be?
                    Easy to use UMG Mini Map on the UE4 Marketplace.
                    Forum thread: https://forums.unrealengine.com/show...-Plug-and-Play

                    Comment


                      #11
                      Originally posted by John Alcatraz View Post
                      this kind of trial and error approach I will get to where I want to get...
                      I'm terribly sorry to hear that, since it is one of the greates adventure that can happen while learning something new. I'm sure you did not mean that trial and error never helped you to figure something out!

                      Originally posted by John Alcatraz View Post
                      Epic who knows
                      I'd also like to hear from the corporate programmers how to resolve this individual customer issue. I also find it a little bit of odd that while the engine have this frontend design that suggests it will be easy to modify and customize, without the requirement of serious c++ knowledge, yet still here is this bloom which is pry one of the simples effect that you can get in 3d, and there you have this rigid hardcoded pipeline and helpless to find a dam value to set the number of passes.

                      It's doable anyways, but i don't like the way it is being done.
                      Click image for larger version

Name:	bloom1.jpg
Views:	1
Size:	328.7 KB
ID:	1117558
                      Last edited by Konflict; 10-26-2016, 08:18 PM. Reason: image
                      * Sharp and responsive Temporal Anti-Aliasing tips and tricks
                      * Pitch-shift source effect (DSP) over the network (VOIP)
                      * My Portfolio and Developer Blog

                      Comment


                        #12
                        Originally posted by Konflict View Post
                        I'm terribly sorry to hear that, since it is one of the greates adventure that can happen while learning something new. I'm sure you did not mean that trial and error never helped you to figure something out!
                        You don't need to be sorry, trial and error definitely often helped me, but trying to do something like this (messing around with the ue4 renderer) without having a clue what happens there just doesn't seem to be too successful...

                        Originally posted by Konflict View Post
                        I'd also like to hear from the corporate programmers how to resolve this individual customer issue. I also find it a little bit of odd that while the engine have this frontend design that suggests it will be easy to modify and customize, without the requirement of serious c++ knowledge, yet still here is this bloom which is pry one of the simples effect that you can get in 3d, and there you have this rigid hardcoded pipeline and helpless to find a dam value to set the number of passes.

                        It's doable anyways, but i don't like the way it is being done.
                        [ATTACH=CONFIG]115303[/ATTACH]
                        You have removed some hardcoded passes? Is it just one pass on 1/4 resolution now?
                        Easy to use UMG Mini Map on the UE4 Marketplace.
                        Forum thread: https://forums.unrealengine.com/show...-Plug-and-Play

                        Comment


                          #13
                          Originally posted by John Alcatraz View Post
                          You have removed some hardcoded passes? Is it just one pass on 1/4 resolution now?
                          Yes i'm affraid that's the only way i have found so far. Removing the unneccessary downsample passes should help to reduce the cost of blending (even if you manage to ommit the draw on them, the empty RT's would remain to be queued for blending at the last stag, so they had to go), and the downsample pass does a 1/4 instead of 1/2 operation. You were actually pretty close to get the downsample right, but did not aligned the Extent property to project a 1/4 instead of 1/2 which is why the bloom was appeared at the top left quarter.

                          There apparently was no consequences of forcing the downsample pass to do the 1/4 scaling, that is why i believe this is actually doable. I however was hoping to find a way to just kick the current bloom out of the postprocess graph, then fill in with a low cost bloom solution which would do the downsampling on it's own.

                          Originally posted by John Alcatraz View Post
                          without having a clue what happens there
                          What do you really need? I mean, i hardly believe that a nice epic personel just hop in here one day and write an exhaustive documentation of these postprocess classes. So it seems to me that they just gave us this great engine, that you can either figure out by yourself or you're already done with it.

                          Anyways PP is an interesting part of the engine that worth to look into more, and i will do just that
                          * Sharp and responsive Temporal Anti-Aliasing tips and tricks
                          * Pitch-shift source effect (DSP) over the network (VOIP)
                          * My Portfolio and Developer Blog

                          Comment


                            #14
                            Originally posted by Konflict View Post
                            Yes i'm affraid that's the only way i have found so far. Removing the unneccessary downsample passes should help to reduce the cost of blending (even if you manage to ommit the draw on them, the empty RT's would remain to be queued for blending at the last stag, so they had to go), and the downsample pass does a 1/4 instead of 1/2 operation. You were actually pretty close to get the downsample right, but did not aligned the Extent property to project a 1/4 instead of 1/2 which is why the bloom was appeared at the top left quarter.

                            There apparently was no consequences of forcing the downsample pass to do the 1/4 scaling, that is why i believe this is actually doable. I however was hoping to find a way to just kick the current bloom out of the postprocess graph, then fill in with a low cost bloom solution which would do the downsampling on it's own.


                            What do you really need? I mean, i hardly believe that a nice epic personel just hop in here one day and write an exhaustive documentation of these postprocess classes. So it seems to me that they just gave us this great engine, that you can either figure out by yourself or you're already done with it.

                            Anyways PP is an interesting part of the engine that worth to look into more, and i will do just that
                            Bumping the thread! I'm also interested in performance optimizations for the bloom effect. Konflict, would it be possible for you to recap what exactly needs to be changed in the source so that only one downsample is used at 1/4 the resolution?

                            Comment


                              #15
                              Hey, [MENTION=47093]Norman3D[/MENTION].

                              Everything we have discussed here should be enough to figure this out, but for convenience i just done the modification in 4.14.3 and here are the most significant changes have to be made to accomplish the result:

                              Remove the additional stages, then restrict the BloomStageCount to 1 only

                              Code:
                              FBloomStage BloomStages[] =
                              		{
                              			/*{ Settings.Bloom6Size, &Settings.Bloom6Tint },
                              			{ Settings.Bloom5Size, &Settings.Bloom5Tint },
                              			{ Settings.Bloom4Size, &Settings.Bloom4Tint },
                              			{ Settings.Bloom3Size, &Settings.Bloom3Tint },
                              			{ Settings.Bloom2Size, &Settings.Bloom2Tint },*/
                              			{ Settings.Bloom1Size, &Settings.Bloom1Tint },
                              		};
                              
                              const uint32 BloomStageCount = 1;// BloomQualityStages[BloomQuality - 1];
                              By restircting the bloom stage count you will no longer be able to make any changes with r.BloomQuality, but i think you can avoid confusions that way. The other way to set this up would be to change the preset value in BloomQualityStages[] it's up to you.

                              Second, the downsample sizing is here, you can adjust the downsample to be a quarter size

                              Code:
                              void FRCPassPostProcessDownsample::Process(FRenderingCompositePassContext& Context)
                              	FIntRect DestRect = FIntRect::DivideAndRoundUp(SrcRect, 4);
                              	SrcRect = DestRect * 4;
                              At the bottom of the code you will find this, change it to 4 and you are done.

                              Code:
                              	
                              FPooledRenderTargetDesc FRCPassPostProcessDownsample::ComputeOutputDesc(EPassOutputId InPassOutputId) const
                              	Ret.Extent  = FIntPoint::DivideAndRoundUp(Ret.Extent, 4);
                              Click image for larger version

Name:	2017-01-18 15_28_54-bloom - Unreal Editor.jpg
Views:	1
Size:	221.7 KB
ID:	1121693

                              The result is a basic bloom, but it does the job. If you wish to give a larger halo, just change it to be 8 (additional halving) in the code, then in the editor set the size scale to be 8 in the postprocess field.

                              Be aware that the changes in the downsample code will not be restricted for the blooms only, so if you have any custom effect that depends on this downsample pass there will be complications. I believe the official engine version doesn't use this method other than the bloom only. This might change later.
                              * Sharp and responsive Temporal Anti-Aliasing tips and tricks
                              * Pitch-shift source effect (DSP) over the network (VOIP)
                              * My Portfolio and Developer Blog

                              Comment

                              Working...
                              X