Hello,
We are using PCG at runtime to scatter and GPU-spawn our small biome elements (PISM – grass and rocks) via hierarchical generation with a precomputed point cloud. Since the point cloud is generated offline in our pipeline, we expected almost no processing cost from PCG at runtime. However, in our profiling (PS5 test build), we are consistently exceeding the frametime budget.
We’re seeing significant processing time spent in FPCGGraphExecutor::PrepareForExecute, and up to one-third of the time is spent in IPCGElement::GetDependenciesCRC.
Here are a few details about our setup:
- MaxPercentageOfExecutingThreads is set to 1.0.
- We only run one PCG graph for our biomes.
- Our hierarchical generation uses three grid levels (16, 32, and 64 meters).
- FPCGGraphExecutor::Execute is taking considerably more time than FPCGRuntimeGenScheduler.
I’ve attached screenshots from a representative frame. According to the profiler, our single runtime graph appears to execute in every cell. We can also share a copy of our performance capture if helpful.
Do you see similar results on your side? Could this be related to a data setup issue on our end? At this point we are considering switching solutions for small biome spawning if these performance costs are expected.
Thank you,
Hugo
[Attachment Removed]
Hi Hugo,
In a fully runtime case like yours, it might be better to turn off the cache completely (e.g. pcg.cache.enabled 0 or pcg.cache.runtime.enabled 0 depending on your version) if you’re not doing significant processing anyway.
Let us know if it helps, but we’ll likely improve this a bit more also in the future (5.8) in various ways.
Cheers,
Julien
[Attachment Removed]
Hi Julien,
We’re almost done with our 5.7 integration. Once that’s wrapped up, I’ll run some profiling tests to see what kind of improvements we’re getting. From what I understand, there are still additional improvements coming in 5.8. Do you know if those changes could be easily backported or integrated into 5.7? It’s very unlikely we’ll be able to keep integrating newer engine versions before shipping.
Thanks,
Hugo
[Attachment Removed]
Hi Hugo.
It’s a good question. I think most changes will be in the same ballpark changes (gpu related optimizations & features) but some might be a bit more across the codebase (esp. around the execution sources) so it might not be “obvious”, and well, some changes outside of PCG to help support perfs might also be non-trivial unfortunately.
I’ll ask the engineer pushing on the runtime aspect to keep a list of things preventing easy integration so you know, but that’s more or less the most we can do now.
Cheers,
Julien
[Attachment Removed]
HI Julien,
I can confirm that running PCG at runtime is faster than before. That said, it’s still producing spikes of ~1 ms, so I’m not yet confident it will be performant enough for our needs as we are budgeting 500us of our CPU time for PCG.
The next step will be to integrate the PCG FastGeo Interop plugin. Since we are scattering biomes at runtime, it seems like a good candidate to improve performance. However, I’m not entirely sure which parts of the PCG processing are expected to benefit the most from this integration, or what specific signals I should be looking for in Unreal Insights to validate that it’s working as intended.
In the meantime, please take a look at the screenshot I linked and feel free to share any additional recommendations.
[Attachment Removed]
Hi Hugo! I’ll ask the team to dig a bit more, but there’s definitely a few things at play here:
- packing/unpacking seems somewhat costly
- you can make sure that any kind of common data you’ll need on the gpu is loaded only once if it’s executed at the unbounded level
- attributes have a significant cost wrt uploading/downloading data from the gpu, so if you know up front you’re not needing them on the gpu or after, you could remove it.
- We’re doing significant improvements on the runtime gen scheduler as part of 5.8 (and well, a lot of gpu perf across the board) which will eventually help you - I’m not sure what amount would be cherry-pickable though. I’ll reach out to the team.
- I don’t think the budget accounts for the time spent in the runtime gen scheduler + there are some levers in the runtime scheduler so that we don’t try to generate more than a set number of components at the same time (which could help removing spiking too).
- FastGeo helps in creating the primitives and removes the need for components & actors to support the runtime GPU generation. This is also something we’re pushing forward in 5.8 fyi
[Attachment Removed]