UEFN Stability Update

Hi all,

As most of you are aware, the recent v32.00 Fortnite release introduced several stability issues and regressions. These included significant disruptions, such as UEFN and FNC content being inaccessible to players, UEFN & Creator Portal being unavailable to creators, and bugs in some features.

With 32.10 we again saw several hours where UEFN content and tools had intermittent availability to players and creators.

These releases did not meet our standards, leading to many internal discussions on how these issues went undetected and what can be improved in our process going forward.

Rapidly evolving the Fortnite ecosystem, while ensuring stability and compatibility across a wide array of features and experiences, is challenging, but we’re committed to addressing it. This means learning from our mistakes and holding ourselves accountable. We also want to be transparent about the major issues creators recently faced, our responses, and the changes we’ll implement.

In that spirit, below is a summary of the most severe issues from v32. Each summary includes the problem, cause, impact, and the changes we’re implementing based on what we’ve learned.

While these weren’t the only bugs in v32.00 and v32.10, these had ecosystem wide impact and triggered Epic’s internal Critical Incident process.

Issue: Content Service Degradation (32.00 & 32.10)

Content Service is our internal system that tracks all content created in FNC and UEFN, including metadata such as versions and publishing status. Essentially, it serves as our “index of the metaverse” and is critical to both Fortnite and UEFN functionality.

As we expand usage of UEFN at Epic, our reliance on Content Service has also grown.

In update v32.00, changes to the Fortnite Client and Server inadvertently resulted in a massive increase of traffic to Content Service, at one point over 10x the normal load. This surge exposed an issue that led to errors as Content Service tried to scale to handle the load, which in turn impacted multiple areas of UEFN and FNC functionality.

Attempts to reconfigure or scale up the service were unsuccessful, and there was no immediate way to modify the client or server behavior. Fortunately, our Ecosystem Security team was able to configure our firewall to reduce traffic to manageable levels, allowing our services to recover.

We addressed the excessive client and server traffic for 32.10, but on release day still saw a level of natural traffic that degraded Content Service performance again. This time the service remained up but access to Creator Islands and UEFN features was unreliable for several hours. Fortnite Reload was also unavailable during this time.

Impact: Some Fortnite content, including Reload, Jam Tracks, and creator islands were unavailable from 2AM ET to 1PM ET when 32.00 released on 11/2.

On 13/11, the release of 32.10, access to Creator Islands and UEFN functionality was degraded from ~7am ET to 1pm ET, with player seeing approximately a 70% success rate joining Islands. Fortnite Reload was unavailable during this time.

Changes: We are working hard to address this scaling issue and believe we will be in good shape for the rest of the year.

We are also increasing pre-release analysis of Client and Server traffic from upcoming releases to spot accidentally introduced issues earlier.

Issue: Loss of Persistence (32.00)

Shortly after Content Service functionality was restored, several creators using Persistence reported that players were seeing certain item types missing from their inventories.

Our investigation revealed that some Fortnite content had been reorganized for an upcoming initiative. This reorganization caused tracked items, such as diamonds, to adopt a new asset path, making them appear to the Inventory as new items with zero quantity. Although Unreal Engine has a system to handle moved assets (redirectors), a bug in our persistence code prevented the redirect from being applied.

To address this, we had two main priorities:

  1. Resolving the issue to prevent data loss for new players.
  2. Restoring inventories for players already affected by the issue.

The first task required creating, testing, and deploying a new server. This generally takes < 24h but due to a variety of factors, primarily a desire to incorporate changes for other issues, the new server wasn’t deployed until 11/05, three days later. This is a timeframe we recognize as completely unacceptable for an incident of this severity.

The second task involved creating a script to restore inventories for players who had played on an island since the v32.00 release, correcting their inventories. Restoration in this situation is a difficult choice because the impact varies by island, and the choice of whether it’s better to restore data or continue with the new data depends on the impact and how long a restore can take.

We offered restoration to the Creators who had reported impact and began to restore player data from those islands on 11/6 at 2pm. This completed on 11/7 at 3pm ET

Changes: While we have automation that reports when assets have been removed, so for example items in creator islands don’t go missing, this automation didn’t report when assets were moved. We will update this so moved assets are also reported.

Additionally, we are looking into automation options to report when items stored in inventories fail to be loaded by upcoming builds of Fortnite, though this is a harder problem that will take longer to resolve.

Lastly, as noted above the time it took to deploy a server with a fix was unacceptable. This is a process we will improve.

Issue: General Client Instability (32.00)

We quickly noticed a significant increase in client crashes across all platforms.

The primary cause was a change made in update 32.00 to fix bugs related to disabling weapon damage in mutator zones. This change introduced a race condition, where two threads could simultaneously access an array. Depending on the timing and player actions during respawns, this could lead to a crash.

Since these crashes occurred on the client and couldn’t be resolved through server adjustments, we had to release a client patch across all platforms. Client patches take longer to create and require certification on certain platforms, so the update wasn’t available until Wednesday 11/16.

Impact: Clients on all platforms experienced increased instability in update v32.00.

Changes: During the development of each Fortnite release, we track and address known crashes. Our investigation found no prior occurrence of this crash, even with extensive testing hours. Even after identifying the cause, we couldn’t replicate it internally.

This challenge of “issues-at-scale” isn’t new. For example, a crash that occurs 0.01% of the time may not appear during development, but with a million daily players, it could result in 10,000 crashes per day.

To address this, we plan to increase the use of “sanitization” builds during testing. These builds have enhanced detection for memory access issues, even if they aren’t immediately crash-inducing. While we can’t be certain this method would have caught the recent issue, it’s highly likely to prevent similar problems in the future.

Issue: Editor Crash When Loading Projects (32.00)

When update 32.00 was released, we quickly received reports from Creators experiencing editor crashes when attempting to open their projects. Initially, we suspected these crashes were linked to the ongoing Content Service degradation, but we soon realized it was a separate issue.

Diagnosing the problem was challenging since it involved a crash within the graphics driver, which typically provides limited information on the cause. Once we identified this, we advised Creators to switch UEFN to Direct3D 11 as a temporary workaround.

The root cause was eventually traced to projects with a high number of startup warnings displayed in the “toast” popup in the bottom right. Each warning message consumed excessive memory for rendering, and when too many warnings appeared, the memory use could exceed limits, causing a crash.

Fortunately, since we can update UEFN independently of Fortnite, we were able to push an update the next day.

Frustratingly, this was a bug that an internal UEFN user had seen once on an early version of 32.00, but that developers weren’t able to reproduce so the issue was closed.

Impact: Several hundred UEFN crashes occurred in the first eight hours after release.

Changes: During our post-mortem, we noted that our workstations are often significantly higher spec than typical consumer machines, especially in terms of video memory.

Previously, our compatibility testing focused on the “new user experience” and less on pushing UEFN with large projects. Moving forward, we plan to increase testing on min-spec and consumer-spec machines with larger internal projects to better identify potential issues.

We hope this breakdown provides a better understanding of the cause of issues you may have experienced upon the release of 32.00. Providing Creators a stable environment to build and support experiences that excite players is something we take very seriously.

Our goals are to keep improving our systems and processes, listening to issues you encounter, and regularly communicating ways we are working to make UEFN and FNC better for everyone.

Thanks,

Andrew

37 Likes

Thank you so much for the update! It always feels nice to know Epic is listening

4 Likes

Thanks for taking the time to give such a detailed description of everything that went wrong and was fixed. Looking forward to you never having to write one of these again :slight_smile:

2 Likes

Thank you for this detailed report @GrantGoldkey. :sparkling_heart::sparkling_heart::sparkling_heart: This level of transparency is greatly appreciated as I don’t think many people really understand the level of complexity of everything. I’ll be spreading the word of this post.

3 Likes

Thank you for the transparency . Hopefully we wont get it this bad again, but DevOps is DevOps. Lessons learnt and move on forward. :slight_smile:

Thank you Andrew for the detailed report. Helps understand some insight into why things happen. The extra transparency is appreciated.

I would like to add that there are a lot more issues introduced with that update, less critical that the ones mentioned, that have not been addressed yet. I ll post a couple below that we ve spotted:

-Barrier devices cannot be selected in live edit anymore (critical)
-Increased corruption in save data, with users not being able to reset the save data file and are locked permanently from progressing in games (really hard to repro, but we managed to get a video of it happening, unf no logs. Sent all info to Halcyon)
-ErrRuntime_ComputationLimitExceeded started appearing in our project and we don’t know why.
-Players can get in a bugged state where Collision Presets do not match their movement state, and they jitter on slopes when they should normally slide. Not 100% occurrence but happens rarely.
-Server tick rates decreased in our game, but we noticed that it went up again after a while.

Also pending critical issues:
-Cant build large games past 12 players, servers can’t take it and there is excessive lag for all players (closely related to BP actors in general, repro is past 19k, usually seen at 25k BP actors)