Unreal Engine 5 introduces MetaSounds, a new high-performance audio system that provides audio designers with complete control over Digital Signal Processing (DSP) graph generation for sound sources. We’re excited to have Aaron McLeran present our new audio features live, and we invite you to join us!
SoundMix/SoundClass are higher level than MetaSounds – they won’t replace those systems. Instead, today, SoundMixes and SoundClasses simply apply their logic to MetaSounds – i.e. volume scaling and pitch scaling. Also, I suppose applying an EQ (the legacy EQ system), though I don’t recommend using that today as Submix Effects are more effective for EQ.
What will be a replacement of those systems is the Audio Modulation Plugin. That’s a simpler and more powerful parameter modulation system (e.g. like a “sound mix” thing but more generalized) that will allow modulation of any number of parameters via an orthogonal mix matrix (vs. the SoundClass hierarchical graph).
That is an interesting plugin – seems like a lot of complexity to do something that is mostly already supported in UE4 via the Audio Synesthesia plugin and many native features – e.g. getting Audio Envelope data and FFT data is something that is supported out of the box in UE Audio Components and Submixes. Audio Synesthesia supports more complex analysis (Non-realtime so it’s performant). For UE5, we’re adding more robust real-time analysis for Audio Synesthesia as well.
If you google UE4 Audio Synethesia you’ll find lots of docs and tons of examples of stuff people are doing – no 3rd party plugin needed!
I’m not sure of the use case here with all of this synthesis, is the team thinking composers will work in Unreal for their synthesis needs? I’m one, also a programmer (but 99% of composers aren’t) so I can handle the BP style programming, but while it’s a neat facility I’d rather not work in Unreal for this, but instead interact with my Moogs. It’s two kinds of thinking, when I’m composing I need to work with instruments, not software paradigms.
SFX then? Talking to my sound designer, the workflow there is to do the typical layering in DAW/Nuendo, apply plugins and so forth. When he needs synthetic beeps and boops he’ll also use whatever synth’s he’s familiar with.
Sample level triggering is fantastic, especially that it is music knowledgeable so I’m trying to figure out how to use it, but the entire composition/audio world works with WAV’s. And I’m certainly not trying to put down this work in any way, I’m just not sure how to use it. What’s the use case for Engine as DAW, or really it’s a synth DAW plugin as far as I can see.
looks awesome! However it seems this is all applied to the sound source before its played? i’m wondering if we will one day be able to use a combination of DSP, material properties, colliders and audio ‘rays’, (maybe similar to how light rays function), in order to simulate reverb that is responsive to the virtual environment? this would be the audio equivalent to post-processing. It would be amazing to build a custom size/shape room and have audio ‘waves’ bouncing off the walls to create a reverb simulative of that space, governed by the material properties of that room, in realtime. for digital musicians it would be amazing to experiment with practicing, recording, or playing live in a custom shaped room, studio, cave, that does not / could not exist in real life.
This is the source, not before or after. It’s the audio which comprises the actual source of audio. The archetype which we’ve implemented now, intended to replace “Sound Cues” is called “MetaSound Sources”.
We do have ambitions to allow different archetypes – e.g. “Meta Sound Source Effects” and “Meta Sound Submix Effects”. In these archetypes, you’ll eventually be able to do the sorts of things you’re talking about.
You can do what you want. The intention isn’t to replace traditional sound design source material gathering and the whole import and play back samples, etc. Obviously a hardware synth that costs hundreds or thousands of dollars is going to sound better than any software synth. And obviously a software synth used in a DAW, with very different CPU and memory constraints, are going to have an advantage over our software synth. The whole point of MetaSounds is its context – interactive and procedural experiences.
The analogy shouldn’t surprise you: non-real-time graphics rendering (e.g. Pixar, etc) can generate better visual experiences than can be done in real-time. The purpose of graphical shaders is that it is procedurally generated and interactive (and dynamic). You can always just import a canned model, texture, etc, but at that point, it’s just a movie, right?
This is one reason why we make a deliberate comparison of MetaSounds to shaders – MetaSounds are the audio analog to shaders. There are obviously lots and lots of differences, but the fundamental idea is a programmable audio pipeline to allow for custom DSP processing that allows game audio to be fundamentally procedural and interactive with respect to the “game”. I put game in quotes because this technology is more powerful than a game – it’s interactive media experiences.
One additional point that some are failing to understand (and we did point this out in the stream with Rob’s presentation), is that what is in Early Access is definitely still Early Access. We hadn’t yet implemented composition (i.e. MetaSounds inside MetaSounds) and we hadn’t implemented Presets (reusable toplogy/graphs). The idea is that, at some point, non-technical sound designers can simply re-use already created presets. There will be libraries of already made graphs and accompanying presets and people will be able to preset surf to make their own sounds with existing MetaSounds.
Game Audio is and should be much more than import a .wav file and play it back. That’s absolutely an outdated mindset analogous to the old days in graphics where it was just textures and polys.
EDIT: Apparently I can’t reply to more than 3 people at once.
I will reply to the question higher up about mic input:
You can do that with the old sound system. We have a mic component that lets you process audio in real time. We can also record audio from Submixes to disk (.wav files).
And with EOS, we have our own VOIP system, that with a bit of elbow grease, can allow you to do DSP processing on VOIP signals. We have done this in Fortnite.
Sorry for the late reply and thanks @Minus_Kelvin ! Great reply. Yes I understand and agree with you generally, but am just trying to see how this fits into my workflow. It should be mentioned the continuing importance of wav’s however, I wouldn’t discount them as outdated so easily. Much of what I produce are stems from either live musicians or virtuals (due to cost) - mostly orchestral and big band. It would be lovely if we could put a VST into a game (ignoring licensing issues which universally forbids it - and not to mention the size requirements!). Then we could have acoustic instruments (even if virtuals) that respond to game play - that would be amazing! So that’s my only point is that AFAIK it’s only useful for the electronic choir, not the wind/brass/etc.
So perhaps the metasound->shader comparison breaks down a little bit in that offline and real time shaders don’t produce all that different results, whereas a synthesized piano is vastly different from a sampled or recorded 12 ft. But this is all quibbling … however, I do encourage you guys to not abandon wav’s in this new architecture as that still will remain mostly what we produce when we work outside of electronic instruments (e.g. much of the time)
Anyhow great work by you folks! I’m still not sure how I’ll use it compositionally, but I’m working with my sound designer where it’s going to be very powerful.
Continuing the stem discussion, since we’re in EA, if you guys wanted to extend the reactive music idea to wavs one way that would work well would be to use a timecode as the time metric that could be used for triggering wav playbacks. Perhaps this or much of it is there now, but I think you mentioned that you can trigger at sample points. That’s great but is the audio engineer view but more useful is using timecode.
For example I compose in Dorico, which can have a timecode and marker stave which can be exported as a text file. I could compose in transition points, say “meet enemy” or “sprite surprise” which are not just sound effects, but which musically make sense. I could also have launch points, where you could leave a musical section properly before going to these transition points.
Thinking out loud … imagine a toon walking on the yellow brick road. They’re approaching the Emerald City. So I’ve got the yellow brick road piece playing in a bar section loop (with 1 :|| 2 end repeats), when they’re getting close, the game could go to the :|| 2 part properly for a seamless musical transition. Now this is the buildup, this section is less musical, a tremolo or something, telling the player they’re getting close, then they come through the trees and see the city. Boom, jump to the next section for the Emerald City. This would let a game composer work like a film composer, with the same kind of workflow, but in an interactive way.
Does this make sense? Since I can compose this in at scoring time, and get timecode connection derived from the score, it would work brilliantly. Anyhow some ideas I’ve muddled over the years on, feel free to PM if it sounds interesting.