Soundfield Submix Endpoint: What is the clock basis for the incoming data streams?

Hi everyone,

I am working on a soundfield submix endpoint which allows to play higher order (N=4) ambisonics via ASIO soundcards.

I think that I have understood most of the involved functionalities. However, one aspect is not clear to me so far: the audio data enters the soundfield submix endpoint via the encoder, the transcoder or the mixer. Which clock drives the incoming data? I assumed that I may control the clock via the callbacks but neither in polling nor in callback mode, the engine seems to wait for any of the functions I implemented.Is it the default audio device involved in unreal that triggers new buffers to arrive in my endpoint? If this is the case, it may be required to synchronize default audio device and ASIO device :frowning:

Thank you for any assistance in advance and best regards

Hi there, I am doing something similar but I output audio using VBAP instead of HOA.
I write my samples to a RingBuffer in the SoundfieldEndpoint’s OnAudioCallback. I then read from that RingBuffer in an RtAudio callback (already available in the UE source code). This means I have to keep these two callbacks in synch, which is not the best but it works.

Hi GrobiThee,

in the very last stage, I am using a VBAP followed by some delay compensation and speaker equalization. The actual rendering is part of another audio engine which is connected out-of-process, therefore, the submix endpoint was the preferred choice. In your case, the RTAudio encapsulates ASIO also? Then, the clock is driven by your ASIO device? Is the OnAudioCallback not in sync with your RTAudio callback since it is the only audio device in the system? If not - what actually drives the OnAudioCallback in your case? There must be another clock source…

Hi everyone,

my higher order ambisonics playback works now. I have the encoder and the mixer realized as part of Unreal. Afterwards, my Unreal plugin transfers the ambisonics of order 4 (25 channels) to another rendering engine via multiprocess communication (socket) to be prepared for playback via an ASIO device with a lot of channels. My audio engine then does the VBAP, the speaker equalization and the delay alignment. It really works very well :-).

While finishing, I had the following findings:

  1. The clock for audio output in the submix endpoint is derived from the default audio device. Since my ASIO audio device and the default system audio device are of rather high quality, the clock drift is not really a problem. In order to consider the buffering jitter on both sides a jitter buffer with a size of 4 buffers (1024 samples per channel) catches away all inaccuracies.
  2. If a source is directed to my soundfield submix endpoint (encoder) in Unreal, the relative position is not reported with real position data until an attenuation instance is attached: If a stereo source is rendered and attenuation is “off” an azimuth angle of 90 and 270 degree is reported (in degree), and for a mono source, the azimuth angle is 0. If an attenuation is attached, all angles are reported according to the current scenario - but all values in radian. The report of azimuth in degree with attenuation off seems to be a bug in Unreal.
  3. The elevation and azimuth angles are reported in a really strange way. I am sure that there are good reasons for this kind of directivity computation but i had to derive new values of “real” azimuth and elevation to match this to my spatial audio rendering engine.

If anyone is interested to learn about details regarding the soundfield submix endpoint, you are welcome to contact me.

Best regards


, I would love to pick your brain on this. I’ve been messing around with using spat 5 and max msp with communication over osc in some basic tests but would love to learn about spatializing to speakers from the engine using ambisonics. Please give me the most convenient way to contact you.


As gschian0 says, I would also like to contact you. After all my search online you seem like the only user who actually understands how this works. We are halfway through 2022 now and the documentation is still non-existent on Epic’s side :frowning:

1 Like

This is very interesting. 1. Would it be possible to extract the HOA signals and decode them in Metasound? Working out delay compensation would then be quite easy, having distances. and some form on input from BPs . It could be possible also to implement an ambisonic weight formula directly in MS maybe?
2. Are you using some form of convolution for speaker equalization? I ask because modifying the spectral characteristics would allow to import data of existing speakers for accurate simulation and auralization…

Hey … am very interested in how you achieved this. Working on a live AV project with Unreal and need to be able to decode to a large speaker array!

if u send an already encoded ambisonic soundfield, on the other side you just need the decoder, as stated before it is an easy soultion if you use Max/MSP with things such as ICST Ambisonics objects. I suggest not using irregular speakers setups, with ambisonics. With irregular solutions you are better off with DBAP than VBAP or ambisonics. WFS is totally another world

Thanks Jabbinuz. Yes, for sure. My problem is I don’t know how to get the B-Format out of Unreal into Max, Reaper etc… do you have any idea how to do this?

I am not an expert in multichannel output from UE (which I believe has to do with the device output component), but consider this: Ambisonic (which ever order) are encoded audio signals, hence they can be treated as normal audio signals coming out of whatever playback engine. They can be 16-24-32 bit at any sample rate. This means they can be treated as 4 discrete audio files (1st order). There are many questions which should be answered first to devise a strategy, for example, how are you encoding the ambisonic file? Are you using spatialization/attenuation? How are you planning and why/when should your ambisonic files be played?
I do not have UE under my eyes now, but I believe a possible solution may be setting up a quartz based system for phase accurate playback in a BP, queing your 4 files (if 1st order, if 3rd order then they are 16), playing/triggering them when needed-> send the signals to a dedicated mix, and output the mix directly to a sound-mix endpoint . I need to test this. I would stay away from Metasounds for this. Overview of submixes in Unreal Engine | Unreal Engine 5.0 Documentation

HI Jabbinuz… thanks for this. I am fairly experienced with Ambisonics… the rub seems to be getting the output from Unreal, whatever order… have searched fairly extensively and mr above seems to be the only person I can find who has made it work. Someone else tried to use Wwise, but even then UE would keep switching to stereo rather than a multichannel output, so even that wasn’t possible.

From @hkhauke 's post it seems like it is possible using submixendpoints but there is no doc on this as of now that I can find…

The journey continues…


Hi everyone,

oh, sorry, I was very busy working on other stuff and missed some of your discussions.

For our solution, we have two applications running:

  1. UE with a soundfield submix endpoint and
  2. a separated application which has control of an ASIO interface with all the required channels (we use 15 loudspeakers at variable locations).

On the UE side we collect the isolated source signals in a soundfield mixer buffer. That mixer buffer is just a set of 25 channels - we use higher order ambisonics, oder 4. It is actually our own mixer in C/C++ code since UE supports only order 1 / 4 channels.

Given the single channel source and the direction we artificially render a HOA signal just for this single source and add it as a contribution to our overall mixer signal. If we have multiple sources, each single signal has a dedicated contribution which becomes part of the overall HOA signal mix.

Then, once all signals have been added to the mixer and we are sure that this is all for this time frame, we take the 25 channels and output towards a UNIX socket we created on our UI PC. This all happened on the UE side so far. There are callbacks for each source and one for the “all sources handled” in UE.

Now, the other side (2): here we run an audio playback thread which reads from a (simple) jitter buffer. The data taken from the jitter buffer are 25 channels of HOA signals. Once we got the HOA signal, we run a HOA decoder involving VBAP, followed by a delay equalization followed by a loudspeaker equalization. We run some measurement routines in the playback environment to store all required information in set of matrices for the decoding for maximum efficiency.

The link between the UE output towards the UNIX socket and the other side audio playback is simple: a specific thread observes the UNIX socket and reads the HOA signal when available. Then, the received audio data is added to the mentioned (simple) jitter buffer.

I was always in doubt how the interaction via the jitter buffer works in realtime: input is clocked by UE and therefore the involved WASAPI soundcard, output is clocked by ASIO. Well, it is absolutely uncritical: I do not see any clockdrifts for long operation times. If there is one I have a clever jitter buffer recovery rule :-).

I hope that helps out. Let me know if I can help you any further.

Best regards


Thanks so much for this - very helpful. I don’t suppose you’d be willing to share your UE HOA mixer?