How to best debug VSM rendering code?

Hello,

I’ve written a custom rendering technique which will supersede my older vertex factory + mesh pass processor approach. My motivation is to be able to use a custom mesh shader and visibility buffer rendering technique which has pushed me to build my own parallel pipeline. As part of the migration I’ve maintained the old pathway so that I can do side by side testing to ensure everything is working properly. The one area that I’ve been running into trouble is with VSMs and I was hoping you might have some tips on how to debug novel code which uses VSMs. Most of my understanding of the inner workings came from VirtualShadowMapBuildPerPageDrawCommands.usf and ShadowDepthPixelShader.usf, but I might be missing a piece of the puzzle.

It seems like my code to render into VSMs is almost working. When I capture in pix and debug, it appears that I am properly transforming my geometry correctly according to the views needed by the VSM system, however I have two issues that I am having a hard time working through:

  1. If I enable r.Shadow.Virtual.Cache.ForceInvalidateDirectional my shadows appear correct. If I disable that, then what I see is the shadows appear for a second or so, then disappear.
  2. I have some cases where I can get “holes” in the shadow. I’m following the pattern of doing the InterlockedMax() in the pixel shader, so I’m not sure why I might observe what appears to be properly positioned geometry in the pages, yet some pixels failed to “render”. I’ve disabled all my code that does any sort of culling of meshlets and pix agrees that I don’t have any holes.

It feels like for (1) there is some sort of caching problem going on. My code is setting the page dirty flags like Nanite does in CullPerPageDrawCommandsCs() (and I’ve ensured it happens before FVirtualShadowMapArray::PostRender() processes the dirty bits), but do I need to do anything else here? I don’t see any code that would otherwise mark physical pages “to be committed to a cache” or “reserved so they aren’t re-used by something else,” but I could have overlooked something along these lines. I’m kind of hoping that (2) is the same bug as (1), just with slightly different symptoms.

This brings me to the original question: how do we debug this kind of code? I’m having a hard time debugging for the following reasons:

  • It seems like the physical page allocation isn’t deterministic on boot? When I am A/Bing against my vertex factory version I see differences in ShadowDepthPass_UncachedPageRectBounds when debugging the shaders, so it’s hard to be sure everything is identical.
  • The use of UAVs only in the pixel shader makes it extremely hard to find invocations in pix to do any kind of debugging after the culling pass.
  • Unlike CSM, viewing the physical pages in pix is very unintuitive, is there some way to make it so we can look at the physical page texture and more easily spot bugs? Even if it means it’s less efficient, just as debugging service?
  • Any other tips?

Thanks in advance.

[Attachment Removed]

Yes it sounds like there are some caching issues. There are two important things for cache invalidation for VSMs: 1) invalidations getting triggered - normally from the CPU when objects move or are added/removed from the scene - and 2) the bounds of the objects being correct in GPUScene for the actual page overlap invalidations.

The first part is generally done via the VSM cache manager scene extension. See for instance FVirtualShadowMapInvalidationSceneUpdater::PreSceneUpdate. There is a pass that runs before the GPUScene data is updated from the prev->current state (to handle invalidating the previous positions and removed objects) and one that runs after the update (to evaluate the new positions and added objects). If you object is moving (transform changing) or such it will generally already get added to these lists. If it is constantly deforming like a skinned mesh, the proxy will return true for “HasDeformableMesh()” generally which will force it to always be invalidated (as long as it is being rendered). If you need very custom handling, you can do your own thing via the “GetShadowInvalidatingInstancesInterface” as Landscape does to some extent. Note that content also has a few ways to override the default invalidation behavior on a per actor/component basis.

For the second part it’s really just making sure your instance bounds are correct (conservative) and properly getting into GPU scene. Often this is overlooked for non-nanite objects as the CPU culling in Unreal is not very tight, so even though in theory bad bounds can cause objects to get culled improperly even in primary view, in practice it often happens to work ok. Not so with VSMs: the culling is much tighter and at each of the VSM pages so issues with bounds will often appear in various ways such as missing geometry or shadows leaving “trails” when the invalidations don’t cover the proper areas. I believe there are ways to visualize bounds in editor but I don’t remember the specifics (‘show bounds’ perhaps?).

In terms of general VSM debug, I’d start with the visualization modes in the editor. Stuff like which pages are being invalidated can be seen pretty easily in the Cached Page view, so problems with that are readily apparent. Similar views like the Virtual Page debug can let you see if things like your missing geometry are happening right at the edge of a page and thus are possibly bounds issues. Like Nanite, most of the VSM logic (including basically all the caching and page management logic) runs on the GPU, so most of the debug has to happen there as well unfortunately.

As to your specific questions:

1) Correct, the allocation is not deterministic as it uses atomics on the GPU. Indeed viewing the physical page pool is not usually very useful. It’s usually more desirable to use the existing debug modes or to modify one of them to view specific values. These debug modes show the data in projected space after page translation which is much more intuitive.

2) Yep, but the use of UAVs is unavoidable as we need to scatter data to arbitrary pages that aren’t known on the CPU in advance.

3) You can look at the physical page texture (with viz or similar command) but it’s generally not going to be very useful since it’s jumbled up as noted in 1). I’d start with the other suggestions first.

Hope that gets you on the right track!

Andrew

[Attachment Removed]

Thanks for getting back to me Andrew, I appreciate the insights here.

I actually figured out the cause of (2), and it was simply a bug in my mesh shader that I didn’t see. I’m still trying to determine if (1) (caching bug?) is real or not. I only ever saw it when I was trying to isolate the cause of (2) by making it so I rendered a subset of geometry that created non-closed meshes. I don’t know if that would trip up VSMs, but I am still not ready to say that (1) is a non-issue until I’ve soaked stuff more.

Yea, all the reasoning for why VSMs do what they do makes sense. The way it maps the pixel coordinates to the physical pages makes a lot of sense, but it sure makes it hard to debug stuff! I tried binding an additional debug RT to write pixel coordinates, however due to the fact that the viewport and the UAV physical size diverge it ends up causing all sorts of problems.

Thanks again, hopefully after more testing I can put this one to bed.

[Attachment Removed]