How to setup asynchronous streaming and buffer pool to avoid frame drop

Hi Basile,

I appreciate the extra data you have been providing, which will help further investigate your issues. Regarding the long hitches in UWorld::UpdateWorldLevelStreaming, this is likely caused by packing too many static mesh instances into a single component. The logic for that is in UInstancedStaticMeshComponent::CalcBoundsImpl. We have introduced some logic in 5.6 to cache the results, but I am not too familiar with that code myself. Since the conversation has moved away from rendering, could you please file a new ticket regarding the world partition streaming? That way, someone with more experience with that system can field those questions.

To wrap up some threads here based on your questions:

I found out that if you are using the Launcher version of the engine, Superluminal has had issues in the past in properly capturing traces, which is likely why I have not been able to view them. If you still want to use Superluminal moving forward, please file a new ticket with the engine and Superluminal logs collected during the capture so we can get some extra eyes on it.

How do you name the threads which are getting into the traces? I started noticing (most likely related to our simulation connectivity) “unknown” threads activity

The engine names its own threads via UE::Trace::ThreadRegister(), which emits a $Trace/ThreadInfo event containing the thread ID, system thread ID, sort hint, and name. See TraceAuxiliary.cpp for an example. I am not sure what you mean by the “Unknown” threads you are seeing, but assuming you are referring to the greyed-out threads, those are OS-level threads that run alongside the engine and for which we do not capture any direct information.

Am I right to consider the final phase as GPU-limited?

Looking at the trace you uploaded and ignoring the big spikes due to the memory allocation issue we discussed earlier, I wouldn’t say you are GPU-bound. Your average frame time is around 16-17ms on the GPU (see screenshot as an example), which is great, so most of your time should be spent getting your CPU utilization in line and then seeing if you need to make another pass on optimizing GPU frame times.

[Image Removed]

If you still have questions about the GPU side of your project or rendering-related questions. For anything else, I would like to ask you to make some new tickets.

Cheers,

Tim

[Attachment Removed]

Hi Tim,

Agreed for the WP hit, I will create another ticket to follow up. We already have the ability to contrain the instances count per component but this is not applied. Also, it probably should be organized in quad tree or else to be relevant.

Noted for SuperLuminal. Then I guess we will update the engine should we want to go further in that direction. Am not sure at this point this is required.

Thanks for the traces hints. While you are right that system (nvidia) threads are also active and not exposed in the traces, we also have our own threads for the network interface and data exchanges. I was considering the effort to name them properly so potentially add trace scopes for completeness. I will look into the namespace you pointed me to. I should be able to figure out if we want to do it or not. Based on the current traces, they have little impact but since the default priority is higher than Unreal workers, it may be of concern one day.

Last but not least regarding the end section. By final phase, I was looking at after 7 minutes 35 secondes.

On my side, I am seeing this :

[Image Removed]I read this as busy GPU and idle GPU.

[Image Removed]

Thanks,

Basile

[Attachment Removed]

Hi Basile,

Yeah, if you want more insights into which threads are active/idle/busy, then adding in extra names won’t hurt. I just checked the Insights trace you pointed out, and you are correct about the GPU idle/wait statistics. To me, it even looks like you are GPU-bound. I can’t annotate the screenshots you sent, but the GPU0-Graphics track is continuously working on frames that are two submissions behind the Game Thread (check the numbers next to the Frame (ms) event). That’s the reason why you are seeing the GT stall on Frame Sync Time events. It occurs throughout the trace, but the cost is often hidden by extra CPU work at various times. I would investigate that once you have smoothed out the asset-loading and upload hitches we discussed earlier in the thread.

Cheers,

Tim

[Attachment Removed]

Hi Tim,

Getting progresses on the hitches. The target platform is much more performant than my laptop from which traces are extracted.

Still GPU remains the bottle neck on some specific level location mostly because of the numbers of tree instances in the area.

We improved the situation using HISM (versus ISM) and trying to work on simplest tree LOD.

Thanks for the confirmation

[Attachment Removed]

Hi Basile,

Nice, it sounds like you are making good progress, so I will close out this ticket. If you want to continue this conversation once you have new information, please feel free to create a new ticket and reference this one so it can be forwarded to me.

Cheers,

Tim

[Attachment Removed]

Hi,

Unreal Engine has no straightforward way to manually pre-allocate primitive pools at the start of a game, but changing the following CVars can help to reduce frame hitches:

  • increase r.GPUScene.MaxPooledUploadBufferSize (Maximum size of GPU Scene upload buffer size to pool, default at 256000)
  • enable r.GPUScene.ParallelUpdate (please see [this [Content removed] for some additional info on this CVar)

The following article provides some best practice guidelines by Epic to avoid frame hitches cause when loading large levels: https://dev.epicgames.com/community/learning/tutorials/6XW8/unreal-engine-the-great-hitch-hunt-tracking-down-every-frame-drop

If you have further questions or comments, please let us know.

Best regards,

Sam

[Attachment Removed]

Hi Tim,

Would you happen to have a contact name or a bug reference so I can loop with our NVIDIA contact ?

Thanks,

Basile

[Attachment Removed]

Hi Basile,

I am not sure we have a bug reference, but Daniele Pieroni has been looking into this from our side. You could probably point them out to your Nvidia rep so they can connect.

Cheers,

Tim

[Attachment Removed]