We are working with a licensee and have a new Zen Server instance running in CI that is being populated by a Horde-based build. A large cook completed and then the Zen Server proceeded to hit the garbage collection timer and deleted most of the data.
The log line showing the GC summary is attached.
After GC occured the disk usage dropped back down substaintially and in-progress and new cooks started writing rather than reading from Zen Server.
Our configuration is the same as the recommended one in the zen_config.lua from this document: https://dev.epicgames.com/documentation/en-us/unreal-engine/set-up-zen-storage-server-as-shared-ddc-for-unreal-engine
The CI instance of zen server version is 5.6.16 and the project being cooked is on UE 5.6.0.
For this project, an in-office zen server running 5.5.17.0 has been operating for a long time without issue.
As a temporary work around we’ve disabled garbage collection on the CI Zen instance.
What would cause Zen to behave in this way? How can we prevent it from happening in the future?
Oh and one other piece of log data in case it is helpful. All of the data that was garbage collected had long lines like:
[25-09-09 13:26:27.085] [main] [info] GCV2: cachebucket [COMPACT] 'D:\zenData\cache\ns_ue.ddc\legacytexture': dropped all chunks from '00000000.ucas', freeing 1022M
Will look into it. It is seems like zenserver thinks a lot of data is unreferenced int the cas store, it does not think that data has expired…
Is it possible to share the full zenserver log?
Thanks.
Just clarifying - did you take more than one log file and paste together or was this just one log?
Looking at these lines:
[25-09-09 12:47:56.163] [gc] [info] 96.7G used. 'D:\zenData\gc': 97.2G in use, 403G free. Disk writes last 15m00s per 30.0s [556534422326425943647336547740], peak 32.3K/s. Full GC in 38m31s. Lightweight GC in 38m31s. Disk usage GC in 403G.
[25-09-09 12:48:26.216] [gc] [info] 96.7G used. 'D:\zenData\gc': 97.2G in use, 403G free. Disk writes last 15m00s per 30.0s [565344223264259436473365477480], peak 32.3K/s. Full GC in 38m01s. Lightweight GC in 38m01s. Disk usage GC in 403G.
[25-09-09 12:49:06.362] [main] [info] log starting at 2025-09-09T12:49:06.360Z
[25-09-09 12:49:06.362] [jupiter] [info] log starting at 2025-09-09T12:49:06.360Z
[25-09-09 12:49:06.362] [zenclient] [info] log starting at 2025-09-09T12:49:06.360Z
At 12:48:26 it looks like zenserver exited and restarted but it should really rotate the logs at startup so it looks weird to me.
Investigating a different issue where we see these cut off logs without error reports so it looks like you are experiencing the same thing.
Also, it looks like we don’t get crash reports from that machine as sending crash reports likely is disabled via the no-sentry option in the config file.
It possible, please enable it and hopefully we can see crash reports in our tooling.
Could you switch you zenserver instance to use httpsys instead of asio?
Make sure to remove the `--http asio` option part of the zen arguments in the ini settings for the engine.
I have kept investigating this and it looks more and more as a hard crash in zenserver. We are trying to figure out why they don’t get reported to Sentry and we are having issues locally as well with getting crashes reported to Sentry.
One theory about it dropping a big chunk of attachments is that zenserver crashed but the cook kept running and output was not written properly to zenserver due to lack of verification on the UE side.
Normally a crash in zenserver does not result in loosing data.
Investigation continues.
Yeah, restarting the zenserver when no cooks are running is the best as you might get errors in the running cooks when zenserver restarts.
Could you have a look in the .sentry-native folder in the DDC folder for zenserver and check if there are any .dmp crash dump files in any of the folder there? If so I’d very much would like to have a peek at them.
Ok, thanks for the update.
Continuing the investigation here.
We just released zenserver 5.7.4 with fixes and improved logging, could you please give that a try?
https://github.com/EpicGames/zen/releases/tag/v5.7.4
Also it seems like Sentry is picking up more crashes now than before so hopefully we get a report if it crashes again.
Hi Dan.
Here are logs you requested. Let me know if there’s more information you need
Ah sorry! Yes these are stitched together. They were pulled out of our centralized log store. I don’t believe any logs are missing.
OK. We will discuss that internally.
Looking at the logs on disk they were rotated when ZenServer was intentonally restarted. I don’t see any log file rotation that would be indicative of a crash.
This shared Zen Server is already configured to use httpsys (see attached config). This config was based on the one from the documentation (here).
I’ll follow up on the `--http asio` option for the local zen server.
And one other thing perhaps worth noting. As the documentation recommends CI builds have `UE-LocalDataCachePath` set to None to disable the local DDC, and exclusively use the shared Zen Server.
OK. Thank you for the update.
Is there a recommended shutdown/restart procedure for a shared Zen Server?
Something like:
- Stop all cooks
- Restart the shared ZenServer
- Restart cooks
Thanks! We’ve enabled crash reporting for the shared zen servers and re-enabled GC. We will report back if there are any more issues.
Hi Dan, their aren’t any .dmp files in .sentry-native.