The weird issue is that it seems to happen at random for both Linux and Win64 builds, with no real direction on where to look other than MultiProcessCook failed. We’ve had 3 builds fail in a row, as well as many more succeed without the issue, not changing anything.
Error trying to run Commandlet BuildCookRunWithAutoMultiProcessCook. Still attempting to copy additional logs to output directory. Error: Cook fail
Any idea what may be the issue or where to start looking to fix this? It doesn’t seem to consistently error on any specific asset it is cooking either, so unsure if that is specifically the culprit.
This happens when an object somehow gets to the postload without the object having been deserialized first.
Since this is not consistent, this is clearly a timing problem. There is a few place where blueprint code puts back RF_NeedLoad for various reasons so it might be one of those place causing the issue. It could also be GC related.
Can you get a callstack of when this happens?
Does it happen on the asset types consistently?
As there been a GC near the place where this happens?
Can you print the object class and object name when this happens? We might be able to identify a pattern that way.
I do see GC happening periodically throughout the log, though it doesn’t seem to have the error trigger anytime near after or before it is called, being able to cook many different assets before it gets the error.
My next step would be trying to track down RF_NeedLoad object flag modification on objects of these types and log them with a callstack stack with additional code in both AtomicallySetFlags and AtomicallyClearFlags. Also making sure to output the full object path or at least the pointer address to make sure you can correlate the PostLoad error with the RF_NeedLoad changes once you hit one.
It might reveal the culprit.
You can look up FPlatformStackWalk::CaptureStackBackTrace and FPlatformStackWalk::ProgramCounterToHumanReadableString. We do this at a few places in the engine to output callstacks.