Tech questions: What counts as a voice, sample accurate playback, mono -> stereo transitions

Hey guys,

I love mixing tiny soundbytes together to a big sound, up to 6 sounds for a weapon shot for example (without the tail). Yet I do not want to waste 6 voices. I would prefer using BluePrints, because they are more flexible than SoundCues.

In a perfect world, for example a weapon shot:

  1. I have 6 sounds representing different elements of the asset, which are supposed to play at the same time, sample accurate, 3 variations per sound
  2. the 6 sounds have different distance parameters, at a certain distance their volume goes from 0 to 1, stays there for a while and then goes back to 0 again -> distance/interpolated float value dependant crossfades
  3. some of the sounds are stereo and I want them to become not-spatialized/ambient when the listener is very close to the source and when they move away, the sound becomes downmixed to mono & 100% spatialized
  4. each time a shot happens a random variation of the sounds is picked, gets a tiny bit of volume & pitch randomization, then they get submixed and then they are ready for spatialization etc.

I’m just wondering if this is possible with UE. I do not want to waste too many voices (although audio processing is cheap these days, might as well bump the max voice number higher than 32, as long as its small & short files) and I need sample accurate playback.

Our dear friend Minuskelvin already said in the new audio engine multiple triggered audio events are rendered sample accurate. Is this the case with ANY NUMBER of triggered sounds?

A bit random ramble, but I hope you tech people get me. :slight_smile: Amazing GDC talk btw, got me back into UE audio. :slight_smile:


Ok. Tested in BP with 100 Saw Waves (50 normal, 50 phase inverted), all I hear is the vorbis artifacts -> sample accurate playback works. Until it does not. :stuck_out_tongue: Sometime a SAW wave comes through. Do you want the project do try on your end? It’s extreme but I thought breaking the system is a good way to test the system. :slight_smile:

A problem I still see is that for every of my called “WAV PLAYER” instances in a soundCue there is a Sound Source -> voice, so I ended up with 100 voices.

Could there be a way to do certain sound operations BEFORE the sound becomes a voice? This would make composite sound that are made up from many, many layers work better, because a certain audio amount can be mixed & built before any DSP/spatialization happens.

Thank you!

There is a slight very edge case I was just chatting with some colleagues here about in the audio mixer for the “sample accurate” start time. I have a simple solution for it that I’ll check in today on so that it’s guaranteed to be sample accurate. The issue is that there’s a very tiny window of time where an audio even can get queued while the queue is getting pumped on the render thread. It’s a VERY low probability though (and I would be surprised if you’re hitting it), but technically is possible. The solution is simple so I’ll just do it.

What I think is going on for you is that there’s a default limit of 32 sound sources. For you to play 50 normal and 50 inverted, you’d have to make sure you bump up the voice count (currently called ‘max channels’ but I want to rename it) to 100 in the project’s audio settings. The audio engine does a sort of the 100 wave instances based off priority times its volume (a volume-weighted priority sort essentially) and takes the top 32 wave instances and turns those into audible sound sources. So if you didn’t bump the voice count to 100, it’s going to be a “**** shoot” as to which sounds actually get heard. For your sounds that means that there’s a pretty good chance you’ll be playing sources without their paired phase inversion. If the voice limit is 32, a more accurate test would be to play 16 saw waves and 16 phase-inverted saw waves.

The most expensive thing with voice processing is realtime sound file decoding so if you have a bunch of sources that you need to decode anyway, it’s not really saving you much to mix then process.

However, I think what you’re suggesting is something I’m calling a “source bus” – and incidentally is a feature I’m beginning work on today for, hopefully, release in 4.18. A source bus is a concept that is sort of a meld between source processing (which can be 3d spatialized) and submixing (which output is a multi-channel bed, post-source spatialization). A source bus is going to be a new type of USoundBase that is treated as a source which is infinitely looping (i.e. you’ll need an audio component handle to it to stop it), and sources can opt to “route” their audio to it, which is analogous to a submix send. The source bus will then be treated in the audio engine as any other source and will be spatialized (even HRTF spatialized), attenuated, occluded, etc.

This way you can put your source effects on the source bus and other sources can send their audio to that source bus, get spatialized as the source bus position, and share the source-effect chain state. This allows source effects to persist between sound instantiations (think LFO filtering applied to a series of 1-shots), and a number of other really exciting applications. E.g. poor-mans echo/ricochet effects, geometry-based volumes which automatically send sources to geometric source buses, etc.

Thank you very much for replying. :slight_smile: Your work is awesome.

I am known for breaking audio things and reporting. :slight_smile: So yeah, I guess I hit it. Everything is set up to use the new audio engine (-audiomixer, windows setting file) and ofc I also raised the max voices of the project to 128, so I’m playing 100 sources at the same time. This are the audio-stats just when I start playing:

After 1-2 seconds the stats normalize.

When I release my button the AudioComponent is stopped. The bugs only happen very rarely. Here is a video.

Just in case you might find time to test, this is the project, uploaded to weTransfer, 3rd person game template with some custom player character blueprint action going on, press play, enable stat audio, press & release LMB.

Source bus sounds sexy! But I wonder, since I am no audio programmer, decoding audio sounds like decoding COMPRESSED audio, in UE I don’t see any way to use PCM, it always uses vorbis for all audio. That correct? What about simply reading out PCM?

This looks like not an issue with sample accuracy but with triggering some edge case that is resulting in a deadlock. Also the performance of realtime decoding a hundred ogg vorbis sources is going to be problematic. I’m actually a bit impressed this works at all let alone only rarely have issues :stuck_out_tongue:

Does the issue ever manifest with lower voice counts?

Yeah I was unable to reproduce the issue and record it at the same time. It would always deadlock for a few seconds. I guess my 4.3 GHz 2500K quadcore is not good enough anymore. :rolleyes:

The actual issue happens without the deadlok too. I lowered the number of triggered sounds to 40 and it still happened with approx. the same freqency as with 100 triggered sounds. I say again, feel free to download this test project. All you have to do is open the editor, load it, play it, press LMB a few times with stat audio enabled and listen for yourself. It really seems to be an edge case. :frowning: