PCG on UE 5.6 is slow

Hello, I’m following up regarding the questions about [PCG on 5.6 is [Content removed] We’ve been quite busy recently, sorry for the delay.

We tried setting pcg.EnablePointArrayData to 0, but it didn’t change the performance.

Maps that previously took 10 seconds to load now take 20 seconds, and those that used to take 20 seconds now take 40 seconds.

We can’t provide an Unreal project with the issue for now, but here are some screenshots of the PCG profiler during the generation of a map (“ProfilePCG.ZIP”).

Steps to Reproduce

Hi Marius,

The pictures you’ve sent show a few things:

First, your obvious culprit is ‘Generate On Region’.

Second, there’s definitely a few things in play:

(1) yes, we made the PointArrayData by default, which has a definite impact on perfs if you aren’t accounting for it.

Seems like this is already out of the picture with what you’ve written here.

(2) your actual execution times are an order of magnitude smaller than the wall time,

This highlights that your graph in its current form is very much game-thread (GT) bound.

In this case, that brings me to something we did change in 5.6, which we should have put a bit more emphasis on.

We’ve lowered the execution budget (per frame on the GT) in PCG to be more in line with what we were expecting, after internal tests.

However, clearly it can impact the wall time of the generation depending on your respective bottlenecks.

In 5.6, we lowered the runtime value from 16.66ms to 5ms (“pcg.FrameTime”) and the editor budget from 50ms to 15ms (“pcg.EditorFrameTime”).

I would suggest you adjust your local cvar for this (or better yet, find the “right” value for your use case), and that should have a significant impact.

Otherwise, it’s possible we’ve moved things around to the GT in order to prevent race conditions and crashes, but I could not summon an example from memory.

Hopefully that helps - please let me know!

Cheers,

Julien

Yes, as I mentioned in my first question [PCG on 5.6 is [Content removed] :

“I saw that with 5.6 there were changes to PCG performance, with pcg.GraphMultithreading going from false to true and pcg.FrameTime going from 16.6ms to 5ms. I tried disabling multithreading, which saved me about 20s, and then increasing the FrameTime to 30, which also improved the time.”

Hey Marius,

I missed that when I perused that issue - to be clear in your case if you’re in the editor it has to be pcg.EditorFrameTime that you need to tweak.

Otherwise, as I said, there might be things we’ve moved to the GT while we got those sorted (there were some issues with the Get Actor Data nodes, if I recall correctly, but we might have improved that in 5.7).

I understand you might not be able to provide an example project, but could you do a cpu trace so we can have a more in-depth look?

Alternatively, expanding your profiling data (by making the depth non-zero) in the PCG editor might point you to specific nodes that have a significantly longer wall time than before - which would be a good starting point.

Let me know if you can give us more information.

Cheers,

Julien

Hi Marius,

Thanks for giving us the capture, so from a first inspection, there’s nothing obviously problematic -

There’s 22k tasks taking 1.9 second, but they’re all running on the game thread, so it stands to reason it takes ~3 second here.

The scheduling is costly here because your graph is complex (plenty of dependencies to track), but it does not seem like there are large data sets being processed, which of course isn’t best for performance.

In this case, I’d normally say a significant problem here is that you’re not running in multithread (e.g. pcg.GraphMultithreading is off) - as you’ve told us in the previous thread.

Would it be possible to get a capture with multithreading on just so we can compare?

I’ll duck out of the conversation in favor of Patrick to follow up, but know that we’ve done good perf improvements in 5.7 in the scheduler, and that we’ve also made significant improvements on how graph parameters are passed in subgraphs which should help on GT usage as well (even though that’s not going to be before 5.8).

On a side-note, fyi “wall time” is really just a measure in ‘real wall clock time’ of the end time of a task minus it’s start time.

It’s interesting as a metric because it can show you bottlenecks (esp. on the game thread) and corresponds to the end user assessment of performance, but a lot of things can factor in here so it’s not a robust metric in general.

Cheers,

Julien

Hey Marius,

What I am seeing comparing the multithreaded vs non-multithreaded trace is a major hitch of generic element execution which explains most of the total execution diff (ExecuteMainThreadTask+ExecuteAsyncTask = 4s vs ExecuteMainThreadTask = 1.9s) (UX related?)

Aside from that the actual graph executor is expected to have some scheduling overhead because of locks when running multi-threaded but I would expect without that hitch that we would’ve spent 0.2sec instead of 1.9sec on mainthread which shows that MT works (about 8000k tasks went of the gamethread)

When you are taking those numbers, do you have the Graph Editor open? That might be the hitch you are seing as the timings vary when the window is open (known issue)

I know you’ve already sent a lot of info our way but we would really like to get to the bottom of this with you.

Are you able to give us a couple more traces?

*Always with the graph editor closed*

Make sure to set the same EditorFrameTime for all traces and disable multithreading

A trace of your graph execution in 5.5

A trace of your graph execution in 5.6

Cheers,

Patrick

Ok let me know if/when you get a 5.5 capture as this would prove very valuable since we don’t have access to your data to do it ourselves

Patrick

Hello, sorry for the delay.

Yes, I had already understood that in the editor it is pcg.EditorFrameTime that needs to be adjusted, but my previous performance tests were done in a build because we have always noticed a huge difference between the editor and the build.

We will try updating to 5.7 in the next few days to see if we can regain some performance.

I have included the utrace here from generating a single room, with EditorFrameTime set to 30. The generation happens between frames 1922 and 1988.

I checked the PCG profiler with increased depth, but I do not see any node in “GenerateOnRegion” taking significant time.

Most of the time spent in that node is inside “PrepareWallTime”, and the “Prepare / Exec / Total” sections do not show anything heavy.

I first thought it might be waiting because of the large amount of work done during generation, so I removed almost all of the PCG code to keep only one GenerateOnRegion node (cf “Depth 0.png”) and generate a single room. But even with only that node, we still get a PrepareWallTime of around 270 ms, and there is nothing else taking enough time to justify waiting for the data to become valid.

I then checked with depth set to 1 and I can clearly see each loop of the node contributing its share to PrepareWallTime (around 20 ms per loop). But with depth set to 2, I do not see any node that explains this preparation time.

Depth 0 png

Depth 1 png

Depth 2 png

utrace of 1 room

Hello,

I enabled multithreading with the spawn of a single room in the editor and 30 ms on pcg.EditorFrameTime.

I went from 2.7 seconds before to 7 seconds with multithreading enabled.

Here are the PCG profiler images and the utrace (the generation runs from frames 1188 to 1399).

Depth Multithread 0

Depth Multithread 1

Depth Multithread 2

Utrace 1188 to 1399

When I did the trace, yes, I had the PCG graph open but hidden in the tabs.

Here is a new trace without the PCG graph open, in 5.6, and without multithreading.

For the 5.5 trace, I can’t provide it for now. I first need to retrieve our 5.5 version from Perforce to test it on that version. And we will soon have a test with 5.7 to see if there are any changes on that side.

utrace 1186 to 1252