Wait a mesh MipMap, PSO and Nanite model to be fully streamed.

Greeting,

We are actually working on a inventory system.

We have built a queued capture component, in charge to capture item of the inventory into a render target. This happen in a separate, parallele world.

However, we didn’t achieve to solve an issue, packaged and editor :

[Image Removed]

As you can notice on the screen, the model can sometime be not fully streamed in whem the capture happen.

The operations goes in this order :

  1. A request is made to build an item thumbnail
  2. If the builder is available, it start building, if currently building another item, it queue the operation
  3. The mesh (nanite, texture are virtual) is set on a static mesh component, the render target, and the objet set on the correct position for the capture
  4. The capture is taked, and sended back to the requester.

We have tried many solutions, but no one achieve to solve those issues :

  • The mesh appears to be in a very low LOD if the capture happen too fast, and we didn’t found any function that provide information when the nanite is fully streamed on
  • If the object has never been seen before, the PSO can be in creation, we are on strategy two iirc, meaning the mesh is display without texture. Same question has above, we didn’t found any way to ask if the PSO of the mesh has been properly created.
  • We didn’t meet any issue with low texture streamed on, but in case, same thing has above, is there a way to know if the mesh texture are fully loaded and displayed ?

All of this need to happen and work on packaged (c++ of course).

The workaround we have now, is to wait 10 frame before the capture happen, that work, but not very suitable/clean solution.

Thanks !

Hello there,

The system you are contructing seems good. Just to understand things a bit better, how are you waiting for the next frame? Depending on how, a single frame wait may actually end up being the same frame, and then none of the streaming systems would have refreshed yet. It may be the initial nanite residency that getting rendering if it’s the same frame.

One the first issue, with tradition mesh streaming, a forced LOD would be ideal, but with Nanite, that’s more complicated. Nanite streaming, like many of the continuous streaming systems, doesn’t provide any callbacks for completion as completion is a somewhat nebulous concept to define in such a continual streaming system. Waiting a few frames isn’t a bad option -- what your doing, in essence, is waiting for Nanite to reach a steady state -- but it can impact user experience. Depending on your fallback settings, disabling nanite on the captured mesh component and forcing LOD0 might actually be the best option to immediately have the mesh at full LOD.

Textures are similar to Nanite. The Virtual Texture (VT) system is GPU-driven and continuous. Since various tiles on a given object can be at effectively any mip level and that could be steady state for the given view, completion is hard to define. These concessions do absolutely complicate tasks such as the one you are doing, especially when the capture isn’t live. Since the object is relatively small in the UI, VT being slightly behind may get masked somewhat more than Nanite, but VT does typically take a few frames to settle in.

One solution would be to use an inventory version of the material that doesn’t rely on VT. That comes with the duplication of textures, which could certainly be an issue.

Another, and this is perhaps a properly hacky solution, is to use a non-deferred capture call on the scene capture and call it multiple times in a single frame. I’m unsure if this would kick the streaming systems since the CPU doesn’t get a tick in between. This also has definite performance implications and may well hitch.

PSOs are their own beast and their state isn’t trivial to track. There are a few reasons and workarounds, but the biggest key issue is that the state of the driver cache is essentially unknown between runs as it may have invalidated, so any check for the PSO existing would be limited to per-run. I’m assuming precaching is in use here as you mention a fallback strategy. Getting PSO state from the same run may be possible. PSO requests are discarded after completion, but can be kept around with D3D12.PSOPrecache.KeepLowLevel. Getting that working may be a fair effort though. I don’t believe this code exists in a form that would be easy to reuse, but it’s technically possible. I’ve listed three other options:

Option 1 would be to bundle to the PSOs for inventory items and do the compilation ahead of time. Bundling and Precache will work well together, but this adds manual effort in bundle collection. Automating this manual collection should be possible, though. Ideally, in an environment that is similar to the one in use for the capture as different PSOs can be required under different conditions, and PSO bundling only records the PSOs that were actually used. These PSOs also do have to be compiled at some point, but they could be prioritised with a usage mask either in the early precompile stage or manually after getting to the game logic.

Option 2 is to load the object early. I’m not sure there is such an event that would allow you to do this in your flow, but this is generally a recommendation for all assets when using Precaching. PostLoad will cause the Precache system to emit the required PSOs, so that may be useful to do at some point prior to needing the PSO in rendering the object. I believe changing the static mesh also emits PSO precache requests, but that’s typically very close to the point of rendering the asset.

Additionally, looking at the code, UPrimitiveComponent has a public member called PSOPrecacheRequestPriority that sets the priority. Depending on how the assets are set up, you may be able to leverage that to have your UI items queue-jump a bit if the early load is only briefly before the capture needs to happen.

Option 3 is to block on either all active PSOs or all PSO Precache requests. If there is no bundle, these are the same. This can hitch, so this is likely the last option I would try. The functions are NumPrecompilesRemaining() and NumActivePrecacheRequests(), and the code for these functions is in ShaderPipelineCache.cpp:825 and PipelineStateCache:4139.

I hope that helps.

Best regards,

Chris

Thanks you so much for that in depht answer !

We are waiting for the next frame using GFrameCounter and doing a simple compare.

Basically :

  • CaptureItem Add a CaptureRequest into an Array.
  • The build system run a tick in a parallel world. It look for any Capture Request into the array. If it’s not currently building one, and there is one, it execute a StartCaptureItem. This is where we setup mesh and position.
  • Then, each frame ProcessCaptureItem will run, ensuring : object are fully loaded (async), and after they are fully loaded, 10 frame has been spend.

And looking by option 3, so we can simply looking for NumPrecompilesRemaining() and NumActivePrecacheRequests() and doesn’t end the first build until there is one ? As this is running async, the check for it also, it would not hitch, only downside every PSO even outside the one we need need to be done until the item start build, but that would work right ?

Hi there,

GFrameCounter is a good way of doing this. Waiting for the next tick can cause the event to occur later in the same frame.

If the scene capture is orthographic, you could render multiple items in a single request by moving them in the view and creating a pseudo atlas.

The PSO approach would work. It probably makes sense to wait until NumActivePrecacheRequests() is zero rather than PrecompilesRemaining as PrecompilesRemaining is the superset of PrecacheRequests. If these are only precached, there isn’t really a reason to wait for all precompile tasks, especially as a large bundle can take considerable time to finish from cold. The only thing to be careful of is that if new jobs can be emitted during the wait, it would lead to the wait being extended.

Best regards,

Chris

As a follow up, I had thought about perspective rendering to an atlas, but there’s a bit of complexity to handling that.

I discussed with a few of my colleagues, and they raised that if the UI items aren’t live at runtime, would it be possible to render them offline in the editor using an editor utility? This would make the cost a lot easier to manage.

Best regards,

Chris

Thanks you so much Chris for your followup .

It’s not really possible to render item offline. The issue is you can have attachment on weapon and we want be rendered.

However, we have the concept of mutable and immutable item (can have attachment or not). Immutable item can be cached offline. But regarding actual performance cost of the system, we are very happy,

The solution to capture for 5 frame, with all your advice work like a charm, and have a very low impact on performance, specially cause we additionally split transform update, component creations on multiple frame, the render target come also from a pool.

We have also added a support for live capture of the item, allowing them to rotate with mouse when you open a detail window.

That’s great to hear. I’m very glad it’s working well.

If you have any further questions, please don’t hesitate to ask. Otherwise, would you mind if I close this case for now?

Best regards,

Chris