Zen garbage collect unexpectedly deleting data

anonymous-edc · September 10, 2025, 8:02am

We are working with a licensee and have a new Zen Server instance running in CI that is being populated by a Horde-based build. A large cook completed and then the Zen Server proceeded to hit the garbage collection timer and deleted most of the data.

The log line showing the GC summary is attached.

After GC occured the disk usage dropped back down substaintially and in-progress and new cooks started writing rather than reading from Zen Server.

Our configuration is the same as the recommended one in the zen_config.lua from this document: https://dev.epicgames.com/documentation/en-us/unreal-engine/set-up-zen-storage-server-as-shared-ddc-for-unreal-engine

The CI instance of zen server version is 5.6.16 and the project being cooked is on UE 5.6.0.

For this project, an in-office zen server running 5.5.17.0 has been operating for a long time without issue.

As a temporary work around we’ve disabled garbage collection on the CI Zen instance.

What would cause Zen to behave in this way? How can we prevent it from happening in the future?

anonymous-edc · September 10, 2025, 8:24am

Oh and one other piece of log data in case it is helpful. All of the data that was garbage collected had long lines like:

[25-09-09 13:26:27.085] [main] [info] GCV2: cachebucket [COMPACT] 'D:\zenData\cache\ns_ue.ddc\legacytexture': dropped all chunks from '00000000.ucas', freeing 1022M

vilse-at-work · September 10, 2025, 2:10pm

Will look into it. It is seems like zenserver thinks a lot of data is unreferenced int the cas store, it does not think that data has expired…

Is it possible to share the full zenserver log?

vilse-at-work · September 10, 2025, 5:19pm

Thanks.

Just clarifying - did you take more than one log file and paste together or was this just one log?

Looking at these lines:

[25-09-09 12:47:56.163] [gc] [info] 96.7G used. 'D:\zenData\gc': 97.2G in use, 403G free. Disk writes last 15m00s per 30.0s [556534422326425943647336547740], peak 32.3K/s. Full GC in 38m31s. Lightweight GC in 38m31s. Disk usage GC in 403G.
[25-09-09 12:48:26.216] [gc] [info] 96.7G used. 'D:\zenData\gc': 97.2G in use, 403G free. Disk writes last 15m00s per 30.0s [565344223264259436473365477480], peak 32.3K/s. Full GC in 38m01s. Lightweight GC in 38m01s. Disk usage GC in 403G.
[25-09-09 12:49:06.362] [main] [info] log starting at 2025-09-09T12:49:06.360Z
[25-09-09 12:49:06.362] [jupiter] [info] log starting at 2025-09-09T12:49:06.360Z
[25-09-09 12:49:06.362] [zenclient] [info] log starting at 2025-09-09T12:49:06.360Z

At 12:48:26 it looks like zenserver exited and restarted but it should really rotate the logs at startup so it looks weird to me.

Investigating a different issue where we see these cut off logs without error reports so it looks like you are experiencing the same thing.

vilse-at-work · September 10, 2025, 5:36pm

Also, it looks like we don’t get crash reports from that machine as sending crash reports likely is disabled via the no-sentry option in the config file.

It possible, please enable it and hopefully we can see crash reports in our tooling.

vilse-at-work · September 11, 2025, 11:50am

Could you switch you zenserver instance to use httpsys instead of asio?

Make sure to remove the `--http asio` option part of the zen arguments in the ini settings for the engine.

vilse-at-work · September 12, 2025, 1:15pm

I have kept investigating this and it looks more and more as a hard crash in zenserver. We are trying to figure out why they don’t get reported to Sentry and we are having issues locally as well with getting crashes reported to Sentry.

One theory about it dropping a big chunk of attachments is that zenserver crashed but the cook kept running and output was not written properly to zenserver due to lack of verification on the UE side.

Normally a crash in zenserver does not result in loosing data.

Investigation continues.

vilse-at-work · September 12, 2025, 3:17pm

Yeah, restarting the zenserver when no cooks are running is the best as you might get errors in the running cooks when zenserver restarts.

vilse-at-work · September 15, 2025, 3:07pm

Could you have a look in the .sentry-native folder in the DDC folder for zenserver and check if there are any .dmp crash dump files in any of the folder there? If so I’d very much would like to have a peek at them.

vilse-at-work · September 22, 2025, 7:28am

Ok, thanks for the update.

Continuing the investigation here.

vilse-at-work · September 29, 2025, 4:30pm

We just released zenserver 5.7.4 with fixes and improved logging, could you please give that a try?

https://github.com/EpicGames/zen/releases/tag/v5.7.4

Also it seems like Sentry is picking up more crashes now than before so hopefully we get a report if it crashes again.

anonymous-edc · September 10, 2025, 4:10pm

Hi Dan.

Here are logs you requested. Let me know if there’s more information you need

anonymous-edc · September 10, 2025, 5:36pm

Ah sorry! Yes these are stitched together. They were pulled out of our centralized log store. I don’t believe any logs are missing.

anonymous-edc · September 10, 2025, 5:39pm

OK. We will discuss that internally.

Looking at the logs on disk they were rotated when ZenServer was intentonally restarted. I don’t see any log file rotation that would be indicative of a crash.

anonymous-edc · September 11, 2025, 11:57am

This shared Zen Server is already configured to use httpsys (see attached config). This config was based on the one from the documentation (here).

I’ll follow up on the `--http asio` option for the local zen server.

anonymous-edc · September 11, 2025, 2:09pm

And one other thing perhaps worth noting. As the documentation recommends CI builds have `UE-LocalDataCachePath` set to None to disable the local DDC, and exclusively use the shared Zen Server.

anonymous-edc · September 12, 2025, 1:59pm

OK. Thank you for the update.

Is there a recommended shutdown/restart procedure for a shared Zen Server?

Something like:

Stop all cooks
Restart the shared ZenServer
Restart cooks

anonymous-edc · September 15, 2025, 9:27am

Thanks! We’ve enabled crash reporting for the shared zen servers and re-enabled GC. We will report back if there are any more issues.

anonymous-edc · September 15, 2025, 3:33pm

Hi Dan, their aren’t any .dmp files in .sentry-native.