Cook non-determinism in ALandscapeStreamingProxy properties

Hi, I’m seeing non-deterministic cook output for ALandscapeStreamingProxy actors when cooking large world that uses world partitioning.

The immediate problem is the following - at some point, the primary ALandscape actor had some property changes (some new entries added to TargetLayers collection). The ALandscapeStreamingProxy actors linked to this landscape in other world partition cells were not resaved, so normally during cook the PostLoad calls FixupSharedData -> SynchronizeSharedProperties, which syncs these new layers. However, in some cases FixupSharedData is not called, because GetLandscapeActor() returns null.

GetLandscapeActor() does TSoftObjectPtr<ALandscape>::Get(), and when I put a breakpoint to catch this situation, I see that this soft pointer has null weak pointer, so Get fails. According to the logs, this happens right after GC, so presumably the root cause is that there’s nothing keeping the primary landscape actor alive while all the packages containing proxies are cooked.

So far I haven’t found a way to reproduce it reliably, e.g. when trying to manually force a GC at various point of cook I don’t see a destructor of ALandscape being called, so I presume I’m missing something - maybe there is _some_ mechanism that is simply not reliable?

One other potentially suspicious thing that I’ve noticed is this - normally the primary ALandscape is loaded by UWorldPartition::PrepareGeneratorPackageForCook:

  • it loops over collection returned by RuntimeHash->GetAlwaysLoadedCells(), and calls UWorldPartitionRuntimeLevelStreamingCell::OnPrepareGeneratorPackageForCook()
  • this function calls FWorldPartitionLevelHelper::LoadActors(), one of the actors it loads is the ALandscape. Judging by the name of the function, I’d expect the actors loaded here to somehow be kept alive until the end of the cook? This would then solve the issue…
  • one thing that is done later is FWorldPartitionCookPackageSplitter::PopulateGeneratorPackage calling PopulateContext.ReportKeepReferencedPackages, and the list passed as argument includes a package that the landscape was loaded from. Again, judging by the name, this looks like this mechanism designed to keep it alive?.. however - by now this package is called ‘TrashedPackage’ (it was renamed by WorldPartitionLevelHelper::MoveExternalActorsToLevel; the ALandscape has IsPackageExternal() returning true), and the comments there imply that the package is now empty and so probably that doesn’t reference the landscape actor anymore?

So my question is - how is it even supposed to work? What is the mechanism that is supposed to keep the landscape alive? Is there even such a mechanism? If not, how is it expected to keep cook results deterministic? Is the ReportKeepReferencedPackages function really expecting to be called with ‘trashed packages’ as arguments?

Some more findings here. So the GC seems to be the consequence here. Normally the ALandscape actor is added to the PersistentLevel of the ‘primary’ world, and it’s being kept alive by FWorldPartitionCookPackageSplitter.

However, one of the world partition cells has another actor (LandscapeSplineActor), and the corresponding FStreamingGenerationActorDescView structure has EditorReference to the primary landscape actor. Because of that, the ALandscape reference is added to the UWorldPartitionRuntimeLevelStreamingCell::Packages array with bIsEditorOnly == true.

When UWorldPartitionRuntimeLevelStreamingCell::OnPopulateGeneratedPackageForCook is called for that cell later, it calls MoveExternalActorsToLevel for its Packages. Because of this condition, the landscape entry is processed despite bIsEditorOnly flag:

		// Always load editor-only actors during cooking and move them in their corresponding streaming cell, to avoid referencing public objects from the level instance package for embedded actors.
		// In PIE, we continue to filter out editor-only actors and also null-out references to these objects using the instancing context. In cook, the references will be filtered out by the cooker 
		// archive will be filtering editor-only objects, and will allow references from other cells because they all share the same outer.
		if (PackageObjectMapping.bIsEditorOnly && !IsRunningCookCommandlet())
		{
			continue;
		}

So then the primary landscape actor is found and Rename is called, which removes it from the main level and adds to the cell level. After the cell cook completes, there’s nothing to keep the ALandscape from being GC’d.

This all sounds quite bad - some random cell containing random actor that happens to reference an always-loaded actor can move that actor to a temporary level and make it not-really-always-loaded.

FWIW removing the ‘&& !IsRunningCookCommandlet()’ in two places (FWorldPartitionLevelHelper::LoadActorsInternal and FWorldPartitionLevelHelper::MoveExternalActorsToLevel) fixed the issue. The additional logging showed that this was only relevant for few landscape references in the full cook of our project. It seems that the check here should be more complicated.

Hi Andrew,

I am sorry to hear you are experiencing this issue and had to deep dive into the inners of the World Partition (WP) and Landscape system to pinpoint its root cause.

Unfortunately, when it comes to inter-dependency between actors, at cook time, the engine gives no guarantee as to whether a referenced actor will be loaded, regardless of its WP settings. In this case, the landscape streaming proxies have a reference to the “main” landscape actor and rely on it (at least in the editor case) to synchronize some of their properties with it.

That statement is not true in the editor case, where the non-spatially loaded actor, the ALandscape, is always available, post-actor registration. But at cook time, the exercised code paths have to be lenient on the fact that the referenced ALandscape might not be present. For this reason, FixupSharedData can only be called when the ALandscape is available and we still let it execute, since it allows to maintain the properties of the proxy synchronized with the main landscape actor, which is better than keeping this out-of-date. But as you’ve experienced, it’s not necessarily consistent when it comes to other ALandscapeStreamingProxy, since there’s no guarantee that the landscape actor will be present when they cook.

Actually, this shared properties system (properties marked as LandscapeShared or LandscapeOverride) was introduced in UE5.3 to formalize what the engine used to do on a per-property basis and in an inconsistent way in the past : in the absence of an “override” boolean property, the engine had to make a guess as to whether a given property was defined at the proxy level or should be inherited from the main landscape actor, which became a problem when the property changed at the ALandscape level. However, for the reason explained above, this can only work reliably in the editor and the best we can do is warn the user when an inconsistency is detected between the main landscape actor and its proxies, and this can only be done at editor-time. That is why the landscape system emits MapCheck warnings at load time whenever a property needs to be synchronized between the proxies and the main landscape (“TargetLayers”, in your case). This can be addressed by simply clicking the “Save Modified Landscapes” hyperlink in the MapCheck warning, on a per-landscape proxy basis. Or simply, click on Rebuild on the viewport notification area to do this on all affected proxies :

[Image Removed]

This is the recommended way and I would strongly advise against changing the code in LoadActorsInternal nor MoveExternalActorsToLevel, since this might have unexpected implications of a much broader scope than just landscape.

In general, we recommend users to keep the landscape proxies as up-to-date as possible by clicking that Rebuild button when it shows up, since it leads to cook issues (such as this non-determinism problem mentioned here) and the only reliable way to fix them is to do this in the editor. Since this is a recurrent and often misunderstood problem, we are preparing some more extensive documentation for this in UE5.7 and there will also be a new commandlet available by then (the Landscape Builder Commandlet, which can already find on Github if you want to have a look before the 5.7 release) to help productions with this issue. The commandlet allows such out-of-sync properties to be synchronized and the proxies, re-saved, as well as re-generate the other types of derived data on landscape, should they be out-of-date. It can be run as part of a nightly build step, to help keep things up-to-date, landscape-wise.

Hope this helps,

Cheers,

Jonathan

Hi Jonathan,

Thanks for the response.

I understand that manually synchronizing the proxies and resaving everything can solve the immediate issue.

However, I feel that just mandating people to do that (and helping by providing docs and commandlets) is not enough if the consequence for *not* doing it correctly is silent failure. So at very least, do you plan to add some sort of a cook warning/error if non-synchronized landscape properties are found at cook time?

What worries me even more is that WP code, as far as I understand it, does *not* allow robust detection of the problem as it’s written now - as soon as a cell containing ALandscapeSplineActor (that has an editor-only reference to ALandscape) is cooked, the actor is moved from primary level to cell level (and then soon GC’d); any subsequent ALandscapeStreamingProxies won’t find the ALandscape actor they are linked to, so they won’t even know whether their properties are sync’d or not (if they could, they would just sync right then).

In fact, that does sound like a WP problem rather than Landscape problem (Landscape just happens to be triggering it). If any actor has an editor-only reference to another actor which is part of the always-loaded cell, that actor will be loaded early and added to primary level (which is reasonable), but then at ‘random’ point (when the cell containing referencer happens to be cooked) it will be moved out from primary level to the cell level and then later cease be ‘always loaded’. Even more, this does not happen for properties not marked as editor-only.

You recommend not changing the WP code, but doesn’t this whole situation worry you a lot and sound like a nasty bug? Do you have any plans for a proper fix on Epic side in the future?

Hi Andrew,

Sorry for the late response, we have temporarily lost our main point of contact when it comes to landscape on UDN so some tickets tend to slip through.

I understand your concern but this design flaw in the landscape system (due to a dependency between actors from potentially various levels) has always been present throughout the UE versions and cannot be resolved unless modifying the cook process in-depth, which the engine team has always been reluctant to do. If the shared properties lived in an asset, rather than an actor (ALandscape), then, that wouldn’t be a problem at all, since the cook process would be able to load this asset on demand and the shared property values would always be up-to-date. Alas, it’s not possible to dynamically load an actor from a level in the same way that an asset can be loaded and so we end up in this situation.

On one of our internal projects, we have tried using asset metadata to provide the information in the asset registry (in the level asset), so that even without loading the actor, we would get access to the information at cook time. But this was fairly complicated and didn’t work too well in practice.

Moving the shared properties to a dedicated landscape asset is probably the best way forward but the amount of code to modify and the backwards compatibility concerns are too great for a system such as core as landscape, especially considering our resourcing on the topic.

So really, our hands are a little bit tied here and we have to put the burden on users, unfortunately.

Still, as explained earlier, the landscape property sharing system has been introduced in order to at least warn the user in the editor (the only moment where we can reliably do so) when such problems occur and the fix is always only one click away. Plus, the editor is always synchronizing properties on load, so it doesn’t hamper everyone’s experience, it’s simply something to remember to address before cooking.

Also it’s important to note that those “synchronized” properties don’t tend to change too often, really. Once the main landscape actor’s properties (the landscape material, target layers, LOD settings, etc.) are set, they are not usually modified very often by users (or if they do, maybe there’s some misunderstand about how the landscape data is organized…), so the unsynchronized properties warning shouldn’t occur too often in practice (after all proxies have been updated and re-saved, of course).

We dids have some problems in the past, though, where some properties were wrongly marked as “shared properties” and therefore the main ALandscape actor was unneedingly modified and submitted by users, leading to unsynchronized properties. For example, the order of the landscape target layers used to be synchronized, which was a mistake, because it had no impact when it comes to the final landscape data (it was just a UX thing), so by simply reording the target layers in landscape mode, the user would dirty the main landscape actor, and all proxies had to be “synchronized” for no good reason as a result. These problems have been addressed on a case-by-case basis, so if you do experience some shared properties being changed on the main ALandscape actor for what you think is not a good reason, that might be a bug and it’s worth reporting to us. When the shared properties warning appears in the map check, the affected properties are explicitly listed so it’s fairly easy to detect such cases.

TLDR : we acknowledge the problem but we don’t have active plans to address it beyond what we already provide, since it’s always been like this (it was actually worse in the past, since there was no mechanism to even warn the user about such a situation…) but also because under normal conditions of utilization (barring potential remaining bugs), it shouldn’t really occur much often, if users fix the warnings as suggested.

Let me know if you have additional questions,

Cheers,

Jonathan