PCG Builder Stall while IsGenerating()

This question was created in reference to: [[PCG] PCG World Builder commandlet stalls on Pending Compilation of PCG HLSL Shader [Content removed]

We ran into the exact same issue as the thread above posted back in august and it looks like the provided fix that was submitted and available in 5.7 has not actually solved the issue.

Our PCG keeps getting stuck inside that while(InComponent->IsGenerating() while loop and it never exists,

however, if I add the same line the original author has added as such:

while (InComponent->IsGenerating())

{

// HEAT: Maybe fix getting stuck

FAssetCompilingManager::Get().FinishAllCompilation();

FWorldPartitionHelpers::FakeEngineTick(InWorld);

}

This is back to working as intended.

This looks like a regression from 5.5 because we switched to 5.6 like 2-3 month ago and we instantly ran into this issue, but it was working fine in 5.5

Hi Branislav,

could you give me more info on what tasks are “finishing” with the FinishAllCompilation call? The PCG HLSL code should be done compiling with the UpdateResource call that now is synchronous inside commandlets so we might be dealing with a different type of asset here.

Not saying this isn’t a regression but I’d like to know more so that we make the best fix possible

Cheers,

Patrick

Hi [mention removed]​ ,

I have a bit more an interesting puzzle with this.

We are in the process of migrating to 5.7, *but* adding that fix in is fixing it in our 5.6 branch.

However, the 5.6 branch does *not* have that code that you guys added to fix the deadlock.

Now, I merged this same change into our 5.7 branch, and the issue did *NOT* fix it in our 5.7 branch.

So it looks like the issue is still existant in 5.7 and putting that code in is only fixing it in the 5.6 branch.

Let me put a breakpoint and see if I can see what tasks are finishing with the waiting for compilation like you said. (In our case we are not using any PCG HLSL code fyi but the exact same issue exists). My guess is the PCG task scheduling is not working correct because of something.

I’m going to check out 5.6 branch, I’m not sure where to check in 5.7 why its stalling there

Ok [mention removed]​ ,

I ran it again, let it get stuck and added a static bool:

static int fixStuck = 0;

while (InComponent->IsGenerating())

{

// HEAT: Maybe fix getting stuck

if (fixStuck > 0)

{

FAssetCompilingManager::Get().FinishAllCompilation();

}

FWorldPartitionHelpers::FakeEngineTick(InWorld);

}

and than *changed* the fixStuck to 1 for a *SINGLE* execution of the FAssetCompilingManager::Get().FinishAllCOmpilation().

And I made all of the implementation of them debug mode:

+ [0] 0x00007ffb8958b848 {UnrealEditor-Engine.dll!FTextureCompilingManager Singleton} {CurrentPostCompilationDDCKey=…} IAssetCompilingManager * {FTextureCompilingManager}

+ [1] 0x00007ffb8958b4c0 {UnrealEditor-Engine.dll!FSoundWaveCompilingManager Singleton} {bHasShutdown=false …} IAssetCompilingManager * {FSoundWaveCompilingManager}

+ [2] 0x00007ffb893f4400 {UnrealEditor-Engine.dll!UE::Anim::FAnimSequenceCompilingManager Singleton} {RegisteredAnimSequences=…} IAssetCompilingManager * {UE::Anim::FAnimSequenceCompilingManager}

+ [3] 0x00000216b395fa00 {bCompilingDuringGame=true ShaderMapJobs=Empty NumExternalJobs=0 …} IAssetCompilingManager * {FShaderCompilingManager}

+ [4] 0x00007ffb893f4f00 {UnrealEditor-Engine.dll!FAnimBankCompilingManager Singleton} {bHasShutdown=false …} IAssetCompilingManager * {FAnimBankCompilingManager}

+ [5] 0x00007ffb8958b670 {UnrealEditor-Engine.dll!FStaticMeshCompilingManager Singleton} {bHasShutdown=false …} IAssetCompilingManager * {FStaticMeshCompilingManager}

+ [6] 0x00007ffb8958b390 {UnrealEditor-Engine.dll!FSkinnedAssetCompilingManager Singleton} {bHasShutdown=false …} IAssetCompilingManager * {FSkinnedAssetCompilingManager}

+ [7] 0x00000219a6a15ab0 {UnrealEditor-NiagaraEditor.dll!FNiagaraSystemCompilingManager Singleton} {NiagaraShaderType=…} IAssetCompilingManager * {UnrealEditor-NiagaraEditor.dll!FNiagaraSystemCompilingManager}

+ [8] 0x00007ffb8958a308 {UnrealEditor-Engine.dll!FActorDeferredScriptManager Singleton} {PendingConstructionScriptActors=…} IAssetCompilingManager * {FActorDeferredScriptManager}

+ [9] 0x000002167d8dbf40 {ThreadPool=Ptr=0x00000216b3b0f600 {Lock={CriticalSection={Opaque1=0x00000216b3b0f608 {0xffffffffffffffff} Opaque2=0x00000216b3b0f610 {…} …} } …} …} IAssetCompilingManager * {FDistanceFieldAsyncQueue}

+ [10] 0x00007ffb68ed0950 {UnrealEditor-HairStrandsCore.dll!FGroomBindingCompilingManager Singleton} {bHasShutdown=…} IAssetCompilingManager * {UnrealEditor-HairStrandsCore.dll!FGroomBindingCompilingManager}

+ [11] 0x000002167d8dab40 {ThreadPool=Ptr=0x00000216b45f0180 {Lock={CriticalSection={Opaque1=0x00000216b45f0188 {0xffffffffffffffff} Opaque2=0x00000216b45f0190 {…} …} } …} …} IAssetCompilingManager * {FCardRepresentationAsyncQueue}

+ [Raw View] {AllocatorInstance={Data=0x00000216660c4d40 {…} } ArrayNum=12 ArrayMax=22 } TArray<IAssetCompilingManager *,TSizedDefaultAllocator<32>>

So these are all of the asset compiling managers that are running.

Now I stepped through each asset manager and there was:

16: Textures:

1 animation

13 static meshes

55 niagara particle emitters

The fact a *single run of this* got it out of stucked mode in 5.6, at least to me implies that one of these assets arent finished compiling.

Any thoughts?

I have uploaded a file with a full list of the assets as well

Hey Branislav,

Thanks for the added info. Do you run the builder in iterative cell mode or in the default mode where were load the whole world at once?

It is weird because the UPCGWorldPartitionBuilder::RunInternal does wait for compilation in FPCGWorldPartitionBuilder::WaitForAllAsyncEditorProcesses which means something is adding more assets to process after this call.

It would be interesting to understand where that happens. Could it be that your PCG Graph is actually loading data through execution?

Cheers

Patrick

Hi [mention removed]​ ,

We are indeed running inside load the entire world, but our machine is very beefy (64 core threadripper with 256gb ram and a 3080 ti) so its not a problem in this case. I thought IT might be the issue because everything else we run inside iterative mode, however I tried it in iterative mode and we have the exact same issue.

I talked our technical artist and he didn’t actually understand how our PCG graph could be possibly loading data through execution?

He’s mostly using actor properties or actor data or spline data or all of the other normal nodes and the meshes come at the end and than he’s using they spawner node.

Is there something we need to take a look at on our side that is possible to cause a data load?

Thanks,

Bane

Hey Branislav,

I’m going to try and create a repro case on my side and will let you know.

Cheers,

Patrick

but I forgot to ask, do you know where those assets come from in the list you sent? There seems to be a lot of Niagara there and was wondering if any of your graphs was in any ways related to it?

Are you guys spawning complex BP Actors?

Ok thought some more and what could be interesting is to know where the graph is stuck.

When you get in the infinite loop waiting for the component to finish generating could you put a breakpoint inside : IPCGElement::Execute

and step in and give me what node is actually stuck and why? If the graph is not advancing because of data not being finished compiling there is a good chance we will find out how this way.

Cheers,

Sorry for the spam :slight_smile:

Patrick

Absolutely no problem for the spam!! I will put a breakpoint and take a look, I just wasnt sure where to debug it because its my first time debugging PCG code, will report back asap

(I dont think there is anything special about the niagara effects, probably just some hand placed things but I can take a look)

Ok finally getting around to it, so don’t mind too much spam from our side either but essentially inside

Engine\Plugins\PCG\Source\PCG\Private\Compute\Elements\PCGComputeGraphElement.cpp

bool FPCGComputeGraphElement::ExecuteInternal(FPCGContext* InContext) const

case EPCGComputeGraphExecutionPhase::PostExecute: // Fallthrough

it gets stuck here:

if (!Context->DataProvidersPendingPostExecute.IsEmpty())

{

SleepUntilNextFrame();

return false;

}

this data provider pending post execute is never emptied, so it keeps sleep until next frame:

[Image Removed]this is what that pending data provider looks like

for example this one has not generated any instances, so it keeps sleeping until the next frame

So it looks like *actually* the pcg graph is using compute shader to spawn things, going to switch it to use CPU to spawn the meshes to see if it still happens on the CPU

i took a look also inside Commandlet.cpp where it does if compute task worker has work, to submit work and there is one compute task worker in here:

[Image Removed]

and weirdly enough GraphInvoices is empty, so HasWork always returns false, so it never does submit work

So I had our technical artist *disable* the GPU spawning of meshes and it is working now in 5.6 without any changes.

I’m now going to check *all* of the things above in the 5.7.0 branch first and than the 5.7.1 branch, to see maybe its working in 5.7.1 on GPU

*OK* just tried our 5.7 branch, and the CPU version is working perfectly fine so indeed there is *something* funky in the compute framework causing the spawning to not finish in both 5.6 and 5.7 versions

I noticed that 5.7.1 was available, so I merged the latest version of 5.7.1 into our branch and re-ran the GPU version and I think I have a slightly better breakpoint now:

if (!PrimitiveFactory->IsRenderStateCreated())

{

UE_LOG(LogPCG, Verbose, TEXT(“FPCGInstanceDataProvider: One or more scene proxies were not ready. Will try again on the next tick.”));

return false;

}

The LogPCG is getting spammed with this because it appears that for all of the mesh instances it is trying to spawn:

bool FPCGPrimitiveFactoryPISMC::IsRenderStateCreated() const

{

for (int32 Index = 0; Index < GetNumPrimitives(); ++Index)

{

FPrimitiveSceneProxy* SceneProxy = GetSceneProxy(Index);

if (!SceneProxy || !SceneProxy->GetPrimitiveSceneInfo() || SceneProxy->GetPrimitiveSceneInfo()->GetInstanceSceneDataOffset() == -1)

{

return false;

}

}

return true;

}

GetInstanceSceneDataOffset is actualy -1:

The only interesting to me is that these are nanite static meshes, so it appears that the *reason* this is getting stuck in 5.7.1 on the GPU is that the render state of these meshes has *not* been updated, now i dont know if its because of nanite, maybe PCG doesnt support spawning nanite meshes on the gpu? Seems to be an issue only with the PCG compute data provider

Below is debug data from *one of the proxies that isnt ready* inside the txt file, hopefully that helps

Ok thanks for all the info Branislav,

The issue is that we don’t support PISMC in commandlets and we should probably throw an error when used inside them.

The reason is that PISMC don’t support being persisted so there is no point in doing offline generation of them.

I suggest you change the nodes to use non-GPU static mesh spawner instead.

Let me know if that solves the issues.

On our side we are going to add errors so that users know what is up :slight_smile:

Thanks for all the info.

Cheers,

Patrick

Changing it to use non-gpu static mesh does indeed work, and adding an error message would be great, as I assume a lot of people are running into this without even realizing its completely broken through using commandlet or build pcg button.

Much appreciate all of the help