Techniques for debugging memory leaks for dedicated server level transitions

I’m having an issue with my game where it is leaking ~8MB of memory for each level transition. Especially for hosting on a small linux VPS I have, this leak is very bad.

Some properties of the leak I know of:

  • It occurs on Linux and WindowsServer
  • I’ve seen the issue since 4.9, am on 4.12.2 now
  • It is specific to dedicated server. While I see smaller “leaks” on standalone configurations, that could just be other state that goes along with long running UE processes, and is nowhere near 8MB
  • It is independent of user interaction. I have resorted to automatically traveling between levels without any user connecting to it and confirm that the leak occurs
  • I use the DebugGame build flavor. I doubt this is relevant but I have noticed it is an uncommon configuration
  • The method I’m using to determine leaks is the high level working set commit (what you see in task manager on Windows or top on linux, “Current Memory” in memreport)
  • I only do one manual memory allocation outside of UE4, which is adds up to two 128x128x32 byte grids. I’ve verified they aren’t leaking and the math wouldn’t check out anyways.

As I mention above, I have tried to use memreport -full to debug the issue, but it hasn’t been very fruitful. Are memreports only helpful at debugging the specific world, or for the entire process? Comparing memreports, the only meaningful deltas are:

  • Current Memory grows by 8.6MB each level
  • StaticMesh Total Memory - STAT_StaticMeshTotalMemory - STATGROUP_Memory - STATCAT_Advanced” gets listed with negative value that decreases reliably by 15280.
  • For first to second level transition only, memory “allocated in pools” grows 320kb

I attached four samples that show memreport -full ran ~5 seconds into a level where there is no activity, and where the travel to the next level occurs at ~5s after that. I froze the seeding of my procedural generation so its the same level over and over. I would say the most promising lead here is there is something wrong with StaticMeshes? What do negative values mean? I’ve also tried to use the instructions for debugging leaks found at https://www.unrealengine.com/blog/dealing-with-memory-leaks-in-ue4, but those instructions do not appear to be very appropriate for debugging memory leaks for inter-level scenarios. With profiling enabled, loading levels takes a very long time (I gave up after 5 minutes). Even if it worked, I’d have a tough time teasing out what is a leak vs what was actually loaded for a level and not killed yet, as I’m not sure what part of the game’s lifecycle is leaking here.

At this point I’m considering other manual memory techniques like attaching AppVerifier and manually debugging heaps, but that’s a really big and expensive hammer when it’s likely there is something relative simple happening, e.g., the entire old World is sticking around forever. Is anyone aware of some technique or UE tool I can do to debug this? It’s preventing me from releasing an alpha for people to try because my servers have a habit of dying in the middle of sessions due to running out of memory.

Update 19-Jun:
Using some A/B testing of removing “stuff” and seeing what happened, it looks like the leak has to do with the Map itself. I constructed a totally new map that looks pretty much the same, and started using it. After doing this, the leak has dropped from the 8.6MB to around 600KB, which is way better. An interesting thing to note is the leaks are on the order of the Map disk sized themselves: original map size 10MB, new map 677KB. Does anyone know of any gotchas with server travel and Maps that can leave behind maps traveled from forever?

Hello, i can confirm this still happens in 4.15. I just have finished setting up the dedicated server on linux with this big flaw.
There is a bug about this entry here. It’s still unresolved, seems to be hard to fix, low priority, not reproducable or there is some bigger plan for the future.

I did not dive so deep into the issue in my case, but i have similar properties as you:

  • It occurs on Linux and WindowsServer
  • I’ve seen the issue since 4.9, am on 4.15 now
  • It is specific to dedicated server. On client, there is no leak as far as i can tell, not even on Client host.
  • I’m using the task manager as well to determine memory leaks.
  • Restart level is being used, which will just reopen the current level (Later i wan’t to switch over to different levels, so just reset some variables will not work as a mid term solution).

I don’t know how you decreased the map size from 8,6MB to 600KB. This is already very good, but the memory leak still exists and cannot run forever.

As it really only happens on dedicated server and i don’t have specific custom code for it, i’m pretty certain, there is something in the core causing this. Is there any solution for this memory leak yet?

EDIT 05.03.2017:
Hey, i just found the link, which should lead to the right solution, take a look here.
I guess same as me, most people following this guide: https://wiki.unrealengine.com/Dedicated_Server_Guide_(Windows_%26_Linux)
In this case, you have your dedicated server in the binaries folder of you client cooked content, which will cause the evil memory leak. What you have to do is to package the content for dedicated server and put the executable there.

This will describe how to do this: https://wiki.unrealengine.com/How_to_package_your_game_with_commands

EDIT 06.03.2017:
Uh by the way… packaging the game for dedicated server is not going to fix the memory leak itself, as i just see… it will only reduce the effect by around 40 times (in my case). I can confirm what @tripledefault said, the amount of additional memory seems to be equals to the map size on disk.
Another thing to meantion, i’ve already taken a look into the source code of the engine to figure out what is happening there (i’m a programmer). As for 4.15, the whole traveling is handled in FSeamlessTravelHandler::Tick() (World.cpp line 5069-5495). This is one entry point for a lot of gargabe collection handling happening while seamless travel.
Next entry point is APlayerController::ClientFlushLevelStreaming_Implementation() (PlayerController.cpp line 276-287). There is a ForceGarbageCollection calls inside. Through some nesting, this one gets called in PostSeamlessTravel (GameMode.cpp/BaseGameMode.cpp). Interesting fact on this one: As far as i know, the dedicated server don’t has a local player controller, means there is no way that it will reach this line of code (and there is no ForceGarbageCollection elsewhere). Not sure if this is supposed to be like this…
… for now, that issue is kinda… bearable, as i plan to write a server instance manager anyway, which can somehow handle this by restarting the instance if the server tends to go out of memory. But its definately better to have this memory leak 100% fixed.