Best way to add spoken audio to state machine

Hi! I have a meta human character that has an idle state and some talking states that are connected in a state machine. Upon a pressing a button it switches from idle to talking and upon release it switches to another state and then eventually back to idle.

But now I find it hard to add the corresponding audio to the animation. I have it in my level sequence as an audio track, but as far as I can see, I can only add animation sequences to the state machine, right?

I tried a notify track “play sound” in the animation sequence but it doesn’t stop the audio, once the animation is interrupted, which makes sense because it’s not an audio track, but rather a trigger for a sound file.

Is there something I can do or is the state machine a bad way to do this?

create a animation based on some fake mesh, add a sound and attach the mesh to your character. then play the animation from sequencer, do not forget checking loop

Speaking from a TA point of view the simple answer is no in general putting all states into a single state machine would be restrictive as to triggering the “needed” state changes on "demand’. State machines are very good at changing animation states with in a sequence but does not help at all when a state change needs to occur as to on-demand triggers.

To give some idea if you throw a light switch and the light goes on the the result is based on an event that has already occurred as to the “context” of the purpose that the trigger serves. The problem with this migration pathway is it becomes more and more difficult to trigger the necessary state changes as the design becomes more and more complex. In other words the design pathway does scale if it scales at all and will at some point imploded.

To get a better idea where I’m heading consider the following as part of your design logic.

The ideals as to context based animation is the root of how things should work based on the game design and not on general theory as to how something should work as other options based on the design requirements needs to be take into consideration in the manner I hope is explained in the posted video.

So as to this

“state machine a bad way to do this?”

The quick and honest answer is yes it’s a bad design choice for many reasons that would turn my response into a book

Thank your for that elaborate answer! I totally agree, although at least for my scenario I think the complexity would be manageable. Could you maybe point me to another pragmatic approach to do what I want to do?
Is it better to work in level sequences? Can I transition between those as smoothly as I can in the state machine?

Montages are probably the way to go for it.

Think about it from the perspective of something truly massive like witcher 3…
If you had to manually set up dialog lines with switches based on options/progression you’d grow old just making/hooking possible variations…

Create a syatem that takes multiple montages Synced to audio.
Id call it a sentence reader. Meaning each montage is a sentence being read, then according to some settings you transition off to another montage/something else.

You can/could sync sound to montages via notifies, but keep in mind not all notifies are accurate, so its probably a bad idea to begin with…

Thanks for the answer! I can see that animation montages are better to keep the blueprint tidy and less complex, however I would still have the same problem with the sound. Playing the sound via notifiers feels very clunky and the problem remains that it cannot be stopped once the animation gets interrupted by an input. Is there another way to add the sound to the animation montages?

Or is it better to use multiple level sequences for the idle animation, the talking animation and the interruption animation, and then just switch between them upon interaction? I probably cannot blend the animations then, right?

Just create a sound cue which can be interrupted, and make sure both the montage and the sound cue get the interrupt message (interface call probably)?

To answer the question first you can assume that you can cause the animation to blend no matter the random nature of the state change.

For example sync groups and markers is a recent addition that you can track the current position of the left and right foot position so that when going from a run state to an idle Unreal will blend the transition for you with out having to tell the state machine to exit at X percentage so that the change does not cause the animation to pop if in a right extreme pose.

Montages is a good solution however the nature of the montage is it will fire and play to completion unless a different montage is triggered, using the same slot, or told to stop using the montage stop node.

As stated Unreal engine is component based on it’s approach, more so as to blueprint development, where you can build a need by adding to the mix. For example you do not need a sound event added to a montage as you can add the sound file directly to the time line of the animation sequence.

If the question “is the use of a state machine” the best option then my opinion is still no due to the limitations as to practical use with out a better understanding of the context the design path needs to serve. As mentioned the montage would be a better option base on the choice between the two option.

For example

State machines are very good at adding pulse generated state changes like locomotion where as long as as you hold a key, say W to move forward, and exit the state it either defaults to idle or changes to a different state like if S is pressed and the player moves backwards.

The problem with state machines is to maintain the blend of say the player speaking dialogue, or another type of action state like reloading a weapon, you will need to included the action state into the migration pathway. Once again state machines are very good at maintaining sequenced state changes but does not work well at all as to the addition to action state changes ,via montage, as in most case you need to fire and forget beyond stopping the action based on demand.

The montage makes for a better option as you can add to the component based on the need to blend individual action state like dialogue, talking animation, uing one of many different blending mods a to the idea of animation layering.

Check out Layered Blend per Bone

A to good better best once again it’s about context of what the design is required to do in it’s completed form. If I tell you I need a motor vehicle the required information is limited to “making it work” as compared to the context of what is need is a F1 race car or a dump truck :smiley:

Basically, make your own C++ montage function where you feed it a string and the montages come out as the final flow.

In my case the string is a comma separated value set by a table i import which defines animation and and audio files.

You can also add a “conditions” to each entry, and check that it can be used before queuing.

You definitely want to create a buffer type situation where you check that string, pre-load or load as you go down the lines.

In the end, it is actually possible to break down the phonetic alphabet and create a smart “string” reader (except you need to write phonetic strings which is hella hard compared to writing english?).
People have done it. Surely theres also plugins which do just this…

For most people’s purposes, obviously, those won’t do. The end goal often is to present actors acting, not just lipsyncing to whatever gibberish you may type…

So, a custom montage play function where you feed information and the scene gets creates is usually the best bet.

Thougest parts are when the terrain isn’t flat but the scene was shot in a studio which is flat, and you can potentially have to fix adjust a lot of stuff to get the montages to behave without clipping…