Lumen GI and Reflections feedback thread

Krzysztof.N · November 18, 2024, 5:52pm

I’ve exposed r.Lumen.Supported.SM5 in UE5-Main, which you can integrate in order to re-enable DX11 support. Maybe will be able to get that into a hotfix.

That said, DX12 is our main path for a long time now and has a few features like new GPU allocator, which allows to share memory between various temporary GPU allocations. It’s quite unlikely that DX11 would run faster. Most likely difference which you see is due to Nanite, VSM, Ray Tracing or some other feature being disabled when you switch to DX11. And if not, and you have a good proof, then it’s something we would want to fix on DX12 instead.

Uno1982 · November 18, 2024, 6:34pm

Thank you so much!

What do you mean good proof? Look at the thread I linked. If you do nothing other than swap the rhi in a 3rd person template nanite isn’t involved. You can cut SM6 from the equation entirely and then toss out VSM and Lumen in 5.5.0 release as I’ve shown there. Your still SM5 apples to apples without any sm6 supported features. Your memory footprint increases simply due to the rhi and the low level driver and hardware efficiency. This obviously is slower simply by handling a different RHIT and a larger memory footprint.

DX11 SM5 - 4.26ms starting cost (235fps)

DX12 SM5 - 5.41ms (184fps)
zero changes other than rhi results in (but why even run dx12 with sm5?)
Note: the memory footprint difference

Literally nothing more than paying twice the RHIT cost and larger memory footprint as GPU Time is almost identical

DX12 SM6 - 9.58ms (104fps)
zero change other than lumen is now enabled since its locked behind sm6

That’s a total loss of about 123% of the initial starting ms on DX11 SM5. If I could enable lumen in DX11 I would have provided a 1:1 comparison as in 5.4 its like 65% difference with no HWRT involved in pipeline at all with DX12 vs DX11 SM5 Lumen SWRT only. I’ll be happy to follow up here with 5.4 results if you like?

While your correct in that it is features being enabled as you move up in rhi and shader models between DX12 SM5 and SM6 … that has nothing to do with the DX11 SM5 to DX12 SM5 rhi swap… SWRT doesn’t even work in SM5 UE5.5 so no lumen cost to factor in here. Nanite doesn’t work without VSM which requires SM6.

HWRT doesn’t come enabled by default in the template so that’s not a factor until bUseHardwareRTWhenAvailable kicks in for lumen in DX12 sm6.

Also… forcing this path and removing SWRT entirely just because DX12 is where your focusing is taking away the option entirely would throw out the baby with the bathwater.

Update:
As a follow up to this I just tested vulkan SM5 and Lumen SWRT does still work in that RHI however it appears to be frame locked to 60htz no matter what vsync settings you apply and even if you set maxFPS to 10000

glitchered · November 18, 2024, 8:09pm

i think this is just a mindset thing for low end developers. dx11 = cheaper to compute thinking.

i ran lumen in dx12 and dx11 with proper project configs to tick all the low end switches. no visual difference. i haven’t measured actual performance, tho. it just runs nominal 60 or 48 on my rig (i’m not pushing fps on my cpu). i think even the rdna2 gpus should be able to handle dx12 lowend. maybe there are a couple of fps to gain in dx11. i dunno for sure tho. i can only google. i don’t have a steam deck. nice to have a legacy switch tho. tbh. eases the mind.

Uno1982 · November 18, 2024, 9:30pm

Nah …it’s testable and repeatable. DX12 ready cards aren’t the same as D12 optimized cards. Memory bus speed and threading architecture have a lot to do with this at the actual driver and lower hardware level. If DX12 actually ran at DX11 speeds in SM5 we would be using it… why wouldn’t we?

What does a biased belief benefit anyone who deals with stats and numbers … or let me say what benefit would an engineer gain to fabricate something and go through the effort of making a case to continue support if benchmarking showed otherwise? We can’t afford to live in some realm of “I feel this way”. Its a waste of both my time and the epic teams time. That’s why I provided such a detailed outline showing the actual “packaged” numbers and both stat unit outputs to show exactly where the cost is.

If vulkan or DX12 could provide DX11 SM5 speeds & capabilities when running in SM5 shader model, then sure rip it out. We’d be running it already. (And as of the last fortnite update DX11 is still there rocking proudly for all the DX12 “ready” cards lacking the memory bandwidth or Ampere chipsets to see its benefits)

Reputable people in this field/industry don’t operation on “mindset”… We can’t afford to.

glitchered · November 18, 2024, 9:48pm

one of your mindsets is wrong tho. why you test in the 100+ fps range? you don’t need 100 fps to enjoy a game, do you?!? neither does the broad console audience reach that without diminished graphics.

what you balling?

i have a 144 Hz panel and i trim the engine output to give me better visuals for a trade in framerate. and i vsync all of it on mostly 48 fps for my laptop’s sake. it’s a techy thing. more fps does not give you smoother visuals. the v-sync does. the frame gotta be delivered in time for your display and target v-sync rate. always been the old school hack. variable refresh my butt. it will micro stutter. it just hides behind more frames. that’s the balance you gotta achieve.

anyway…

Uno1982 · November 18, 2024, 9:54pm

I think your missing the point of the benchmark? Your taking number outputs and making it personal and attaching “me” or my gaming preferences to it. Again this has nothing to do with me as an individual and I’m sorry you perceived my feedback and provided data in that manner.

This is about raw ms information and the starting overhead. Larger headroom makes for easier optimizations. You are right that some tech allows for a wider curve “deferred rendering” vs “forward” base pass vs lighting cost for instance. In forward you start off fast and fall off quickly were deferred starts a little slower and falls off alot slower as lights are added.

If you go back to my original post … when you have GPU instances shared across 4 ppl keeping your individual per app instance framtime low means that shared budget is large enough to cut back on thousands of dollars an hour across an organization that spins up thousands of GPU instances a day.

I put it this way… the more optimal we can run the more ROI we can keep thus keeping Epic a partner. It really is that simple.

Krzysztof.N · November 18, 2024, 9:59pm

From those Stat GPU screenshots I only see that somehow post processing and unaccounted is higher on DX12. Is it due to a CPU bottleneck? Maybe it’s due to Async Compute? I don’t know, as stat GPU isn’t the best tool for such investigations. Certainly DX11 being faster just due to RHI or drivers would be quite unexpected.

glitchered · November 18, 2024, 10:00pm

you’re not attached. don’t you worry. it’s not personal or individual.

i just don’t like your test methodology, tho. fps pea nuts is so pc, but it’s worthless in the broader spectrum, being consoles that can handle lumen just fine if you wouldn’t ask for all those frames.

a lil balance the engine has to have.

Uno1982 · November 18, 2024, 10:02pm

I can go deeper and give you an profiler output if needed. Again this has been tested across AWS G4 Cloud GPUs, steamdeck, 20 series cards and GTX series cards. You really start seeing gains on 30 series cards which is why I mentioned Ampere chipsets.

Uno1982 · November 18, 2024, 10:10pm

I only put fps there next to the ms for easy reading. Nobody and I mean nobody over here is looking at fps… that stat doesn’t mean anything. If you look at my performance percentage calculation it has nothing to do with FPS its a reduction in total ms from each pipeline and shader model. Again, I think your swaying around the purpose of my efforts and missing the entire point and directing your feedback towards me personally while saying your not.

I’m at a loss to more clearly outline every other stat shown on those screenshots other than FPS especially when Memory Footprint and RHIT were in bold?

Uno1982 · November 18, 2024, 10:29pm

@Krzysztof.N on a side note … what I just realized these systems have in common. AMD CPUs and chipsets. Is there something you guys are tracking internally around differences here DX12 vs DX11 performance on AMD vs Intel? Could there be an AMD specific issue between the RHIT when running DX12 that isn’t seen on Intel chipsets/CPUs? You have me intrigued now. I’ll do similar test on my Intel chipsets and get back with you… Pretty sure the AWS G4s are Cascade Lake however and still get better perf in DX11 but they are non RT Radeon V520 GPUs (running lumen SWRT)[Granted this is UE5.1 mind you as 5.2,5.3 had webrtc memory leaks]

Update:
I did some searching around to see if others regardless of CPU/GPU setup are in the same DX11 camp and found this thread. Again case after case of ppl toggling back to DX11 with a mixed hardware setup.

Appears this is the culprit still

Attached is a really quick Trace you can attach with insights if you’d like to review. (DX12 SM5 UE5.5)
20241118_173420.zip (24.1 MB)

Starac · November 20, 2024, 12:59pm

Can this Lumen problem with DX11 and UE 5.5 be solved, enabled, with some workaround?

Uno1982 · November 22, 2024, 5:26pm

waiting on a hotfix but you can likely look up the PR that was mentioned in 5.5 main or 5.5-dev on source… You can also currently swap to vulkan SM5 but that rhi currently locks to 60htz for some reason.

Ibtesam_Sadiq · November 22, 2024, 6:10pm

I’m not sure if this is the right place for feedback on the current tone mapper, but it seems like a suitable thread for this topic. Today, I watched a video about a new tone mapper that outperforms Unreal’s ACES-based tone mapper. The new tone mapper looked significantly better and more realistic. I’m curious why Epic hasn’t considered it yet, or if they plan to update it in the future. It seems like a more sensible option.

jblackwell · November 22, 2024, 10:08pm

Alas, this is not the right thread- I think this would be better under a feature request? Lumen does use some creative tonemapping for noise suppression, but I’m assuming that’s not what you’re talking about.

jblackwell · November 22, 2024, 10:09pm

That said, lumen 5.5 bug:

Lumen geometry normals visualization is broken when raytracing.nanite.mode 1 is enabled. All nanite geometry registers with a simple yellow normal- parts of the truck, which are non-nanite, and the floor, are traditional geometry. A handful of those tools are nanite, and are not subject to the problem for some reason.

Tangent: would it be possible for you guys to update the 5.5 lumen documentation with information on how transparency interplays with lumen? There are still a few details I’m fuzzy on as I’m experimenting with integrating transparent objects into my scene.

Maenavia · November 24, 2024, 2:40am

@Krzysztof.N

I just noticed that in 5.5, the command r.Lumen.ScreenProbeGather.ShortRangeAO.HardwareRayTracing crashes the engine with the following error:

Assertion failed: TotalResourceCount == 0 || Shader->ShaderBindingLayoutHash == ShaderBindingLayoutHash
[File: E:\UE551\Engine\Source\Runtime\D3D12RHI\Private\D3D12RayTracing.cpp ] [Line: 1975 ]
Raytracing shader with entry point LumenShortRangeAOHardwareRayTracing doesn’t match the RTPSO ShaderBindingLayout.

Starac · November 26, 2024, 12:42pm

UE5-Main should be on github?

TheKJ · November 27, 2024, 6:37am

@Daniel_Wright @Krzysztof.N

r.Lumen.ScreenProbeGather.ShortRangeAO.ApplyDuringIntegration 1 allows users to bypass the noise from shortrange AO without incompetent TAA but introduced noise on moving metallics(first image).

Easy to replicate with simple 3rd person template.
Have no idea how long this has been an issue but this is 5.5 please fix.

Also, looks like there was some overall noise reduction(outside) which is good to see. Indoors it still has issues.

Without Screen traces, near lighting is clean, far field lighting flickers.

Lex4art · November 27, 2024, 8:20pm

Just in case this may be still actual - there is an UE5.5 video about alpha-masked trees VS full on Nanite and it’s quite nice visual difference when Nanite trees are used (reduced flickering & less dark):

Don’t know what kind of hardware & settings are used, though. Those trees are also not that rich in leafs department.