Lumen Global Distance Field Trace Leaking due to Surface Cache Sampling

Hello (and happy new year),

I find global distance field trace can sometimes cause light leaking when two objects are close to each other. And after some research I found it’s caused by surface cache sampling error. For example in the image below, the radiosity probes rays (and also visualization rays in debug view) that intersected with the cube mistakenly sample surface cache from outside the wall.

[Image Removed]

I look into the code and get quite confused about the math behind surface cache sampling bias. I notice DISTANCE_FIELD_OBJECT_GRID_CARD_INTERPOLATION_RANGE_IN_VOXELS is 3.0f, which seems too high under this circumstance. I know that during tracing the objects will be expanded at most one ClipmapVoxelExtent, so 2.0f seems to be a more reasonable value to reject those out-of-card hits here:

[Image Removed]

I notice the interpolation range was raisen from 2.f to 3.f in a history commit 196daca “Lumen Software Ray Tracing - improved global distance field hit evaluation”. Wonder what the case was then? Were there other leaking issues for 2.f? And I also notice not all changes in the commit are respected today, e.g. the CardSampleWorldPosition is not actually used anymore.

[Image Removed]

[Image Removed]Also I am quite confused by the visibilty weighting math below. It seems both NormalizedHitDistanced and TexelDepth are scaled into the [0, 1] range, but not the BiasThreshold.

[Image Removed]

TBH I feel like use BiasFalloff to compare with the depth diff feels like more reasonable -- BiasFalloff = BiasThreshold x 0.5 (so it becomes truely the distance field expansion amount) x 0.5 (to be normalized into the [0, 1] depth range). Not sure if this is a typo or something.

[Image Removed]

Thanks for any help!

Sicong

[Attachment Removed]

重现步骤
I’ve provided a minimal case project. Please just open the umap “GDFTraceLeaking” inside.

[Attachment Removed]

Not sure if the zip has been sucessfully uploaded -- I am gonna reupload it here

[Attachment Removed]

Hello,

Thank you for reaching out.

I’ve been assigned this issue, and we will be looking into this light leaking for you.

[Attachment Removed]

Hi,

It’s been a while when I looked at that code. Currently we’re mostly focused on the HWRT path, as this is what most games use nowadays (FN switched to it a year ago) and what we recommend for all licensees.

“I notice the interpolation range was raisen from 2.f to 3.f in a history commit 196daca “Lumen Software Ray Tracing - improved global distance field hit evaluation”. Wonder what the case was then? Were there other leaking issues for 2.f?”

The idea of this CL was to push the hit evaluation point outside of the wall in order to minimize chances of picking up lighting on the other side. Usually leaking happens when something indoors picks up something bright from the other side of the wall (outdoors). Biasing away from the wall means that we need now a larger DISTANCE_FIELD_OBJECT_GRID_CARD_INTERPOLATION_RANGE_IN_VOXELS to be able to pickup that wall.

And I also notice not all changes in the commit are respected today, e.g. the CardSampleWorldPosition is not actually used anymore.

Hmm, it looks like it got removed by accident or at least there’s no comment about in the CL which submitted it.

Also I am quite confused by the visibilty weighting math below. It seems both NormalizedHitDistanced and TexelDepth are scaled into the [0, 1] range, but not the BiasThreshold.

Surface cache bias is distance field error in world space units. Then we kind of transform it into a [0;1] space, compare with TexelDepths and apply some falloff to smoothen transitions. Seems to be reasonable.

Moreover, after using 0.25 * BiasTreshold, the results look much better -- although the surface cache coverage is slightly reduced (more pink areas), the light leaking is successfully eliminated.

Surface cache error depends not only on the distance field, but also on the card resolution. The lower the resolution the more uncertain are our hits as most of the data is interpolated from a few rasterized card texels. So I don’t think there is a proper value for it, it’s just some bias.

[Attachment Removed]

Hi Kelly,

Thanks for reply.

Put it all together I feel like there are essentially two questions to answer.

First, why ​is DISTANCE_FIELD_OBJECT_GRID_CARD_INTERPOLATION_RANGE_IN_VOXELS 3.0f instead of 2.0f?

Secondly, when calculating texel visbility from a card, is it a typo that mistakenly uses BiasTreshold instead of BiasFalloff to compare with the hit depth difference?

[Attachment Removed]

Hello,

Thank you for reporting this. I can confirm this light leaking can be reproduced as described in the latest CL, and will be opening a bug report.

We will discuss the value of “DISTANCE_FIELD_OBJECT_GRID_CARD_INTERPOLATION_RANGE_IN_VOXELS” with our colleagues.

Regarding “CardSampleWorldPosition”, we can see it was added in CL: 24197434 and it’s use removed in the next revision, CL: 24212584. Keep in mind that any unused code is stripped from shaders during the compiling process, so this will not affect performance or behavior.

Finally, regarding the use of “BiasThreshold”, it is correct. if you look at the use of “saturate(…)” on line 291 you showed, it encompasses everything right of the “1.0 -”, resulting in the entire expression being in the [0,1] range.

It is also important that “BiasFalloff” is only used in the denominator. The “0.25” in it’s calculation adjusts for the later summation of the four values in “TexelVisibility”, via “dot(TexelWeights, 1.0f)” several lines further down.

Please let us know if this helps.

[Attachment Removed]

Hi Kelly,

I am still confused about the “BiasThreshold” part. Maybe I should not say “BiasThreshold” is a typo in the first place since the meaning of the “Threshold” does make sense here in terms of “in how much degree the texel is close enough to the hit point”.

The point is, the current value of “BiasThreshold” is definitely too large. Yes, no matter what value “BiasThreshold” is, the visibility will be clamped to [0, 1] range. What it matters, however, is any card texel within the “BiasThreshold” range from hit point will be counted as a valid sample, and that’s exactly why the hit on cube surface samples lighting from outside the wall!

Like I said, the maximum deviation from the hit point to the surface cache texel during tracing is half clipmap voxel diagonal, a.k.a one “ClipmapVoxelExtent”:

[Image Removed]​

The current BiasThreshold, however, is as large as three times of “ClipmapVoxelExtent” (thanks to “SurfaceCacheBias” having a INTERPOLATION_RANGE of 3.f):

[Image Removed]​

The first odd place is, looking at all previous usage SurfaceCacheBias, they are all halved which gives a bias of 1.5 times of “ClipmapVoxelExtent” which is somehow reasonable. (And would be exactly one “ClipmapVoxelExtent” if INTERPOLATION_RANGE was 2.f -- but doesn’t matter that much).

[Image Removed]​

The next odd spot, looking at how hit distance is normalized into card depth: another halving here to turn the hit distance from range [-1, 1] to range [0, 1]. The “BiasThreshold”, which is computed in the next line and will be compared against during depth test, is never halved, even though both “CardSpacePosition.z” and “SurfaceCacheBias” are from the same world space coordinate.

[Image Removed]​

Moreover, after using 0.25 * BiasTreshold, the results look much better -- although the surface cache coverage is slightly reduced (more pink areas), the light leaking is successfully eliminated.

[Image Removed]Before:

[Image Removed]

[Attachment Removed]

Hello,

We have opened the bug report for this light leaking.

The tracker will be visible after it is approved internally at Epic Games and is publicly accessible.

We are handing this ticket to another team for further investigation and consideration.

[Attachment Removed]

Hi Krzysztof,

Thanks for the reply! Always glad to hear from you.

About the HWRT, unfortunately our project is based on UE5.4, after which HWRT implementation has been undergoing massive refactoring by Mr. Costa. For now our project is heavily CPU bound during RT scene building when turns HWRT on. It’s quite impossible to upgrade HWRT code to later version by cherry-picking given the number of commits related. So we have to stick with SWRT for now.

“The idea of this CL was to push the hit evaluation point outside of the wall in order to minimize chances of picking up lighting on the other side. Usually leaking happens when something indoors picks up something bright from the other side of the wall (outdoors).”

So I think it’s exactly this CL that causes the leaking I encounter. In my sample scene the hit points on cube surface are pushed away, maybe too far away that they sample the surface cache from the wrong object -- the outside wall.

I am still a little confused about the purpose of this CL. It sounds like the CL was meant to prevent picking card from the opposite direction of a single object, right? Like in the illustration below, to prevent the ray hit point F from sampling card CD instead of AB, the CL pushed hit away by vector GH? But I think even without the CL, the card CD would never be picked anyway due to axis culling, in the first place.

[Image Removed]

I revert the commit (or say what it left), and tweak the bias to 0.5f * BiasThreshold (instead of 0.25f) so that the surface coverage doesn’t shrink that much. Everything looks fine so far. This time I explicitly set r.LumenScene.SurfaceCache.CardFixedDebugResolution 64 to exclude the factor of surface card resolution when doing comparisons.

Before:

[Image Removed]After:

[Image Removed]

Before:

[Image Removed]After:

[Image Removed]

This solution seems good enough for me.

[Attachment Removed]

Yeah, if you’re on 5.4 it’s a bit iffy to switch to HWRT. We switched to HWRT by default in 5.5 (including shipping Fortnite with it), but 5.6 is really where things got really solid.

That CL was to fix issues where hits inside a room were picking up objects on the other side of a thin wall. Imagine a small box on the other side of a wall, where due to preference to pick a smaller object you would get a leak from that object onto a wall. Like here, where this ray can pickup red box instead of a blue wall:

[Image Removed]And yes, this looks like exactly the opposite scenario from what you’re showing.

All of this are just some heuristics, so they are heavily content dependent and indeed removing that normal offset could be a better solution in your case.

[Attachment Removed]

Totally understand now. ​I will keep an eye on the scenerio you mentioned. Warmly appreciate for your clear explanation and illustration, like always. You may mark the thread solved as you wish.

[Attachment Removed]