GC FSM - Event-driven, hierarchical finite state machines in blueprint

Hello,

Sorry to double post, but I appear to have picked up another bug. The attached state machine is located in a **User Widget, **and some component of it (I suspect the local states) appear to be being collected by the garbage collector. This stops the state machine, even when the state machine is meant to keep running.

Note that without the “collect garbage” node, the same bug occurs - I just put it in there to test whether that was in fact why the state machine was stopping.

EDIT: This appears to happen with separate State Classes as well, unfortunately.

Hi Albie_123, thanks for reporting. I will look at it. It might be related with the other GC-related bug reported by eanticev just last week, so it’s good to have a repro rule that doesn’t involve multiplayer.
PS: I have good news about the nativization problems: making some BlueprintInternalUseOnly methods public instead of private actually solves the issue.

That’s great to hear. Any ETA on that nativisation fix being pushed onto the marketplace version?

By the way, I can’t manage to reproduce the Garbage Collection error on my Player Controller or Game State blueprints - state machines on these blueprints aren’t getting collected, luckily.

I’d like to address the GC issue also. However, if I see more time is needed for that, I will submit the nativization fix by the end of this week.

Awesome. Was hoping to use it for the jam this weekend, but if it’s not ready, that’s all good. :slight_smile:

I got the same error in the similar situation.
In my case, FSM is not running in server.
My FSM is created in one actor’s blue print. and it Launched in the “Event BeginPlay” node. and after running some seconds or minutes; It will log “LogGC FSM: Error: Object xxx is not running FSMs” just like eanticev 's report. The actor is still in the map, and running fine, but the FSM in it seens be destoried. I tried to trace the bug in the source code. It seems there is something wrong with the FSM’s garbage collection.
Waiting for help… I can’t keep going with my work until this bug is fixed. so sad… :frowning:
https://forums.unrealengine.com/core/image/gif;base64

I have some good news. I fixed the GC issue that has been reported by Albie_123. For some obscure reasons Unreal is marking some types of objects (all Widgets, for instance) as “unreachable” and my code was not expecting that. I am now going to check if the fix also solves other cases, such as the client/server issue that eanticev is reporting. If that’s the case, I will submit the update today.

Yes! As I hoped, all three issues reported by eanticev, Albie_123 and woodzong are indeed related. I have reasons to believe the problem has been introduced by changes in the garbage collector introduced in the Unreal 4.20, because the reproducibility is so high that I can’t explain how the bug could have been passed unnoticed so far. Here’s the catch: during garbage collection, a few objects (in particular, the GameMode object, but occasionally also other actors), may be temporarily marked “unreachable”. The process was changed in Unreal 4.20 to introduce some form of parallelism, so I presume the GCFSM code related to garbage collection is now called at a time when objects may be unreachable, a case that did not occur before Unreal 4.20. When the GCFSM code found an unreachable context object, the context was abandoned and its FSMs stopped. Since I don’t want to mess with the Unreal internal flags, I now replaced a FWeakObjectPtr::Get() with a FWeakObjectPtr::GetEvenIfUnreachable() call and everything is now working.

I am packaging the fix right now and will push it immediately. It usually takes a day or two to be online, I’ll keep you posted.

Thanks to all of you for the reports!

PS: Widgets (as in Albie_123 report) seem to always be marked unreachable… that was helpful in addressing the bug since it removed the little non-deterministic behaviour of actors that may or may not be marked unreachable.

That’s great to hear. It also makes a lot of sense, since my actors would occasionally do the same thing but not all the time, so at first I assumed I’d just screwed up somehow - I could only reliably replicate the issue with widgets.

Thanks for being so quick with the patch!

Greate to hear that!!! Thanks!!! :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley:

I can’t double check, since I’m not in front of my PC, but I received a notification from Epic saying that the fix went online just few minutes ago. Thanks again for all your reports and patience.

I’m concerned that the FWeakObjectPtr::GetEvenIfUnreachable() will not work. I tried that myself and it just introduces a crash a few minutes later due to another cascading issue.

Also, consider what we’re doing here by making this change… we now might be not garbage collecting the FSM at the right time, thus leaving it running even if an object is unreachable.

I’m not sure what your test-case looks like, but you ideally have to run the scripts I suggested because the engine behaves slightly differently in editor than standalone.

I’m happy to connect and go over these issues via screenshare and figure out a solution.

If I remember correctly with this line change I was getting a crash after a few minutes on line 87 in GCFSMUtilities where it’s trying to get


auto context = rootState->GetContextObject();

Instead of this check


return context && !context->IsPendingKill();


You might need to do something like:


return context->IsValidLowLevel() && !context->IsPendingKill();


I seem to remember that just a nullptr check is not always 100% as opposed to IsValidLowLevel

When are you getting crashes eanticev? I don’t seem to be having any problems with the new version but would love to help test what’s going on.

Hi eanticev, I understand your concerns, you have a point here. I’ll dig deeper into the issue using your test project as test case. If you’re comfortable with Slack, would you join me there? It might make me easier to share hotfixes with you. Just send me you email address via private message and I’ll invite you on my dedicated workspace.

I may have oversimplified in my post, replacing Get() with GetEvenIfUnreachable() is a start but it’s not enough to fix the issue. Did you reproduce those crashes with the v1.5.3 version?

When the context object is destroyed, the FSMs will stop ticking and will therefore stop “running” immediately. It’s true, however, that there may be a situation where the root state object and all its FSMs stay around until the next garbage collection cycle, instead of being purged as soon as its context object is. The situation occurs only if the context object becomes unreachable without first being marked pending kill. YMMV, but I believe this case doesn’t occur very often and even if it does, the situation eventually heals itself without memory leaks.

That’s a good advice. The standalone indeed has a different behaviour from the editor, so I must be more careful and check that also. That said, I could not reproduce bugs nor crashes with your test project using v1.5.3 even on standalone.

The problem here could be that there is a situation where the call to AddReferencedObject() might nullify rootState. It has nothing to do with context having an invalid value. Adding a null check on rootState might probably be safe, I just would like to understand if it’s really needed, because I did not encounter this problem during regular use. BTW, due to the way the value of context variable is obtained (it’s the value of a UPROPERTY), a null check is ok, using IsValidLowLevel() would be a huge waste of time.

Anyway, I’m thinking of rewriting this part of GC FSM. It looks a bit fragile and I would like to make it less dependent from garbage collection internal details.

For what it’s worth I’ve tried my state machines now on a variety of classes (controllers, gamestate, widgets, etc.) with both forced GC and just letting Unreal do its thing, and in both packaged and in-editor I haven’t had any issues (either the state stopping, OR the state machine not getting GC’d when the context object does) or crashes.

I don’t think it’s necessary for the state machine to be GC’d immediately rather than waiting for the next GC cycle, especially since virtually everything else in Unreal works this way and IMO it would make more sense to keep it consistent with the rest of the engine. There might be a specific use case I’ve not considered though.

Hi,

I’m pretty sure I ran into a bug, or I don’t understand how Submachines and Local States are supposed to interact.

What I was trying to do was have an “abort” transition on all my states by using a Submachine as described in the documentation. Here’s a simplified example:

If I run FSM_0, I see the states change properly but none of the OnEntry/Exit/Tick nodes inside the Local States fire. If I run the NewSubmachine_0 directly instead of as a Submachine, everything fires as expected.

Am I doing something wrong?

Thanks!

Hi @JeromeParent ,
your understanding is correct: events in the local FSMs should be triggered in your scenario. I will look into the issue and get back to you shortly.
Thanks for the report and your patience,