Long running Servers monotonically increase the size of NetGuidCache

Every time a replicated actor or component is added to the world, replication is prepared for that actor, and that triggers FNetGUIDCache::RegisterNetGUID_Server, which prepares 2 new entries into the look up maps that are currently only cleaned on Seamless Travel or the NetGuidCache itself tearing down.

I believe I can call Clean References manually to try to mitigate this. CleanPackageMaps seems a little too geared towards SeamlessTravel for me to use that (I don’t want to risk the Reset Acks triggering if someone decides to enable that option).

I can tell that some amount of Clean References would be extraneous work on our servers, namely the static guids should never come up with duplicates in this situation (I guess that’s also geared towards SeamlessTravel), and the entire bottom section is essentially just a lot of work to no effect on our shipping servers (probably partially compiled out though).

So right, my question here is… Would calling Clean References periodically be safe to mitigate this growing memory issue?

Or would it be recommended that I leverage a different function?

… Or could I be overlooking some mechanism that actually does clean up these Maps, and this is actually a red herring?

All assistance is appreciated.

Steps to Reproduce
Every time a replicated actor or component is added to the world, replication is prepared for that actor, and that triggers FNetGUIDCache::RegisterNetGUID_Server, which is all well and good.

The only thing is, that prepares entries for 2 look up maps that will last as long as the world does, only getting cleaned when either the Net Driver is destroyed or there is a seamless travel which triggers an explicit CleanPackageMaps which in turn calls FNetGUIDCache::CleanReferences.

With a long running server for a single world, say something that can last a full day or longer, this starts to be an issue…

Additionally, we have whole level instances that we load and unload on the edges of the world for players on demand, which I think exacerbates the issue.

Note: we first identified this using memory insights in a limited scenario that loaded and unloaded a level instance with 10 pawn spawners in PIE 4 times across the span of ~2 minutes. [Image Removed]

Hi,

I don’t believe there are any other functions/mechanisms for cleaning up the NetGuid cache, as like you said, this usually only happens during travel or shut down.

After discussing this with the dev team, it does seem as though calling CleanReferences periodically should be safe. However, it’s worth noting that we haven’t tried and tested an approach like this ourselves, so it’s possible that there could be side effects. For instance, one potential issue that was brought up was that a client might send the server a reference with a NetGuid that the server just cleaned up.

Thanks,

Alex

… And then used Visual Studio breakpoints to confirm the stale weak pointers in the object look up table…

… And code inspection to try to find where things would clean up…

What would be the expected effects for a client sending the server a reference with a NetGuid that the server no longer recognizes? Would that lead to just skipping that action? Or a more severe fail that would lead to a disconnect? Or even more severe and crash?

I would expect it to just skip, and I think all our scenarios, we’d accept that.

Hi,

Again, it’s hard to say what exactly would be expected in this case. I don’t believe this should result in a crash or disconnect, and in my very brief testing (which involved the client sending a RPC with the object reference for the recently cleaned up NetGuid being passed as a parameter), the package map logged some warnings but otherwise continued, with the server just calling the RPC with a null parameter.

That being said, if you do run into any problems and have further questions, please don’t hesitate to reach out.

Thanks,

Alex

Sounds great.

We’ll run it locally for a several weeks, and then provide it to live players. Following our QA plan for a change at this level, I think it will be ~2 months before I can say how much this affects our server’s memory performances…

Might be able to identify some side effects sooner if they’re obvious.

Thanks again for your help.