Download

Packaged, Networked game freezes/hangs for client in 4.17.2 - need Epic engine programmer response

We just upgraded our title to 4.17.2 from 4.14. (And this is in 4.18 apparently as well). I hadn’t heard of any major issues and this was to be our last engine upgrade. We tested the engine on the last stable build of our game and we are experience serious freezing/hanging at random times, which the client does not recover from.

Here is the answerhub discussion on the issue. Epic has not responded. https://answers.unrealengine.com/que…ned-int-w.html

The game thread is waiting indefinitely for a task I cannot debug to finish. I don’t know why I can’t inspect any variables to debug this issue. If someone could at least help with that so I can debug the threads I have to work with if Epic is not tackling this issue any time soon.

Here is another anserhub about this Substepping unusable: causes freeze since 4.17 - UE4 AnswerHub.

Throwing in my support for an Epic specialist to help us out! Are we the only two people having this problem?

Did you already build and launch packaged game from Visual Studio with debugger attached?
It should point to where the exception is.

This is MichaelArchetype. It’s not an exception… As stated above, the game thread is suspended indefinitely waiting for some task to complete. I can’t inspect any variables. This is all in the answerhub post I linked to. I can see all thread stacks in the parallel threads window. They’re all waiting, but the difference is that they’re all completing their waits and then looping. Meanwhile the game thread is permanently waiting.

Did the hang always happen at certain point? Or it happened every time when you executed certain task? Which mean it is completely reproducible.

If it is so, probably it is due to networking issues - not UE4 fault, but there are mismatched bp replications. Ie check items which are replicated are marked as such. Because I have experienced the same issue and get it solved by the method I just point out. Or the more important thing is - is it completely reproducible? So you can work out the solution using that clue…

It’s not completely reproducible. I’ve had the crash happen at every single stage of the game, though I can’t recall if it’s ever happened in the Main menu. It has happened while actually playing the game, or waiting in the “Ready Up Lobby”, I’ve crashed at almost every possible “Event” and also no event at all- just sitting in the lobby.

I’ve had it not happen for longish test sessions, and then I’ll restart the game and have it happen right away of I alt tab or use the console. I haven’t had it happen in offline. I’ve had it happen in the editor once. I don’t believe we do any BP replication but I’ll check.

​​​​​ I’ve had it happen when just playing. It seems random, doesn’t happen to all clients at the same time, and can happen right away some times and not at all others.

I’ve had the same experiences you’ve posted as well. Some people go all night with no crash, I actually thought we fixed it once. Then I booted it up next morning and almost instantly crashed.

@zlspradlin Do the projects that you were testing this in have tick prerequisites? Like we have our weapon’s tick prereq be the character.

I changed line 626 in TaskGraph.cpp to:


bool bSuccess = Queue(QueueIndex).StallRestartEvent->Wait(((ThreadId == ENamedThreads::GameThread)) ? 30000 : MAX_uint32, bCountAsStall);
check(bSuccess);

So after 30 seconds, instead of remaining frozen it failed an assert. I’m now able to inspect values, and am attempting to debug this issue. Inspecting the Queues at index 0 (which is the index that stalled, I find the StallRestartEvent and am trying to figure out where to go from here given EventIds. If the Ids are non-deterministic, than I’m kind of screwed.

inspectedevents.png

Answerhub is being super weird. My exclamatory response was from a few days ago in response to zlspradlin’s mentioning a PR I was also using. It just now got posted however and it makes it look like this was solved. It may have disappeared again.

Running UE4Editor.exe <Project> -game, connected to the live server, I haven’t froze once. Next thing I was going to try was disabling multithreading entirely on the DebugGame.

EDIT: Actually the fact that it’s not happening in UE4Editor.exe -game (which disables steam OSS) has me thinking again that this might be a steam oss issue. I may have only disabled steam on the server but not the client last time, which was admittedly dumb. Testing ASAP.

EDIT2: Tested with steam and online subsystems disabled entirely and still froze.

EDIT3: Added -ONETHREAD to launch and while its interpreted correctly it doesn’t seem to do anything, app is still multithreaded and game still froze. Getting nowhere with debugging.

I ran the game on my local server with -nosteam and did not crash, tested for maybe 45 minutes (normally enough time to crash). I also could not figure out how to limit cores or threads properly and have it actually do so. We’re debugging all day, we’ll keep you posted.

Do you by chance use Gamesparks?

We use steam and a custom backend.

I can’t even find where NOSTEAM is used in the engine source? Is it still used? Searched the entire solution for nosteam and got nothing but the doc referencing it and an error referencing it. Are you sure its disabling steam for you?

You are definitely not the only two having this issue. I have been chasing ghosts through dozens of soak tests trying to find this bug, the time lost has been staggering.

So far I can only pin down a few things, at least in our case.

It happens on standalone games as well as clients.
It only happens after we call UAssetManager::GetStreamableManager().RequestAsyncLoad() and apply the loaded skeletal meshes to our character.

Many more soak tests are currently in progress, it takes at least a day to ‘clear’ each test, but with a bit of uncertainty even then. Currently we are testing our skeletal meshes and skeletons separate from the async load request to see if either of them separately is responsible for the problem

We’re only AsyncLoading our custom world settings in order to get the map load screen earlier in the load process. I don’t know how this would get caught up in the middle of a game for us (which is when we freeze) but I’m going to test removing it soon.

For us, the freeze occurs arbitrarily some point after play begins, it has taken from just a few minutes to many hours before the freeze occurs.
We have not yet been able to reproduce the freeze on our tests that are not using the async load request with soak tests lasting up to 3 days.

We are equally baffled as to how it could be responsible, as the async loading code has run and gone by the time the freeze occurs in any case.

Hello everyone,

I’ll be looking into this issue, but in the interest of everyone seeing the same information, it would be best to keep the correspondence about this issue in one area so please refer to the Answerhub post for further assistance.