Enabling the Opus audio format on iOS with UE 4.27

Simon36457 · June 15, 2023, 6:29pm

Hi.

I’m working on an iOS project that has a large amount of audio data (currently about 600MB after ADPCM compression), and would like to reduce the package size. It looks like UE 4.27 only supports ADPCM on iOS by default, but it would be nice to use Opus (preferably) or maybe Ogg Vorbis (if Opus isn’t possible).

I’ve tried to enable Opus on iOS, but don’t yet have it working (details below). I’m hoping someone reading this knows the ‘trick’ to get it enabled and working.

I see UE 4.27 includes various Opus related code, and there are promising signs that Opus may be usable on iOS:

libOpus.build.cs has an iOS target, and there appear to be built Opus v1.1 libs
There’s iOS support for the cross-platform Audio Mixer in AudioMixerAudioUnit (iOS and Mac)
FOpusAudioInfo implements IStreamedCompressedInfo, which seems to be used by the Audio Mixer.
There’s a comment in Engine/Config/IOS/IOSEngine.ini showing how to enable the audio mixer on iOS (described as the ‘new audio backend’, whereas IOSAudio is the ‘old audio backend’)
The IOSAudio module is hard-coded to support ADPCM only - so we seem to need to use the Audio Mixer.

What I’ve done so far to enable Opus on iOS:

Enabled the audio mixer in IOSEngine.ini, as per the comments in that file
Added the following to AudioFormatOpus.Build.cs, IOSTargetPlatoform.Build.cs and AudioMixerAudioUnit.Build.cs (maybe more places than is necessary):
PublicDefinitions.Add("WITH_OPUS=1");
In AudioMixerAudioUnit.Build.cs added “VorbisFile” and “libOpus” to the AddEngineThirdPartyPrivateStaticDependencies() call
Updated FIOSTargetPlatform::GetWaveFormat() to return “OPUS”, unless IsSeekableStreaming() is true, where it continues to return “ADPCM” (Used code lifted from the similar Android target platform code - where that one returns “OGG”)
Updated FIOSTargetPlatform::GetAllWaveFormats() to also return “OPUS” (with “OPUS” added prior to “ADPCM”)
Similar changes to the above two within FMixerPlatformAudioUnit::GetRuntimeFormat() and FMixerPlatformAudioUnit::CreateCompressedAudioInfo(), the latter returning a new FOpusAudioInfo() of course.

It appears this isn’t working yet because FAsyncDecodeWorker::DoWork() is calling through to ReadCompressedData() rather than StreamCompressedData()

ReadCompressedData() calls FOpusAudioInfo::Decode() but that method doesn’t populate all fields of the returned FDecodeResult struct - but ReadCompressedData() requires those fields to be set up. As a result, it immediately decides the ‘stream’ has finished.
However, StreamCompressedData() calls DecompressToPCMBuffer() which only requires FDecodeResult::NumAudioFramesProduced. Clearly this is where it was intended that FOpusAudioInfo::Decode() would be called.

So it seems like the implementation expects Opus audio assets to be considered ‘streaming’ (EBufferType::Streaming and EDecompressionType::DTYPE_Streaming perhaps). I’m yet to investigate that.

I could note that there’s no technical reason that we need to stream in the source Opus data in chunks (as per StreamCompressedData()), at least for this use scenario - i.e. with the data within the iOS package. However, I’m mainly interested in getting Opus working quickly, with minimal engine changes, and as robustly as possible. If it’s lightly sub-optimal, that is less of an issue.

If anyone is aware of the step(s) I’m missing to get Opus working on iOS, please do share. Thanks in advance.

Simon36457 · August 11, 2023, 2:47pm

I managed to get this all working a while back (Opus on iOS, with audio seeking support) and thought it might be worth posting an update.

I ended up making various changes to allow Opus audio to work via the ReadCompressedData() code path, rather than the StreamCompressedData() code path. The former is referred to in the source as ‘real-time’, and the latter as ‘streaming’. Conceptually, it seemed to me that the former ought to be just fine for Opus playback (including seeking), since in my case all the data is available locally in the app package.

(The ‘streaming’ code path seems to have a separate manager that loads ‘chunks’ of the audio file/stream, where a ‘chunk’ may contain e.g. the header plus a number of compressed audio packets. This may be intended to allow streaming where not all the data is yet available e.g. when streaming over the network)

Following are the changes I made, going in to a little detail, in case that’s of interest.
Disclaimer: These may not represent tidiest / most correct approach, to someone who’s more familiar with the UE audio code. My aim was to get this working for the project I’m working on, for the platform I’m working with (iOS) - though I did try to avoid making any changes that would cause issues on other platforms.

Firstly, made changes as per my previous posting so that the cooked audio is output in Opus format, and so that the app links with the Opus decoding library.

Updated FAudioFormatOpus::GetBitRateFromQuality() to return a sensible bit rate (32000 seems to produce decent enough results)

Updated ReadCompressedData() to pass out the number of samples actually produced, and up to the calling layers.

In FDecodeAudioTaskResults added NumSamplesWritten member, as is already present in FProceduralAudioTaskResults
In FMixerSourceBuffer::ProcessRealTimeSource() updated the EAudioTaskType::Decode case to call AudioData.SetNum() (at least, if NumSamplesWritten has been set - I decided to consider that optional) as is already done in the EAudioTaskType::Procedural case
In FAsyncDecodeWorker::DoWork() under the EAudioTaskType::Decode case, updated the second ReadCompressedData() call so that the actual number of decoded samples is passed out, setting DecodeResult.NumSamplesWritten

Updated IStreamedCompressedInfo::ReadCompressedData() so that it only decodes whole Opus packets at a time (a ‘packet’ being a compressed frame of e.g. 60ms worth of samples). Once there’s not enough room left in the destination buffer for the result of decoding the next packet, it will stop (won’t fill that buffer) and pass out the actual number of samples output.

Added a virtual method to IStreamedCompressedInfo to tell us if this decoder needs to decode whole (compressed) packets at a time i.e. enable the behaviour mentioned just above. The implementation in FOpusAudioInfo just returns true.
Call that in ReadCompressedData() - if the result is false, fall back to the previous behaviour within this method. However, if it returns true…
Keep (existing) RemainingEncodedSrcSize up-to-date during the loop
If the above has hit 0 (in the top of the loop), we’ve finished playback of this sound. Check for this case in the existing ‘if’ before where it sets bFinished to true and does the ‘if looping’ check.
Get the maximum bytes that decoding the an Opus packet will result in (another additional IStreamedCompressedInfo virtual method, implemented in FOpusAudioInfo) - it already has available the number of channels, sample rate and Opus frame size in ms. Beware, OPUS_MAX_FRAME_SIZE_MS was set to 120ms even though elsewhere the Opus frame size is actually configured as 60ms - we certainly want the smaller size.
If there isn’t enough room in the destination buffer, we’re done for now, so pass out the actual number of decoded samples.

Call GetFrameSize() (that already includes updating the SrcBufferOffset member)
Recalculate the source data pointer i.e. SrcBufferData + SrcBufferOffset, and call Decode() passing that the result from GetFrameSize() i.e. the actual size of the compressed Opus ‘packet’.

An update/fix in the ‘if looping’ check mentioned earlier: If we are looping, set DecodeResult.NumCompressedBytesConsumed to 0 to ensure SrcBufferOffset isn’t incorrectly modified before the next loop iteration.
Have that also call a new FOpusAudioInfo method that prepares for looping by calling opus_multistream_decoder_ctl(Decoder, OPUS_RESET_STATE);

Updated FOpusAudioInfo::Decode() to always populate all three values in the FDecodeResult struct. Note that Result.NumAudioFramesProduced means the number of samples per channel.

I always set NumCompressedBytesConsumed to the input CompressedDataSize value
…but set NumPcmBytesProduced to 0 if OpusDecoderWrapper->Decode() returns <= 0

To add support for seeking Opus audio, I added a table to the Opus audio format’s file header, storing the size of each ‘packet’ (uint16 array). On load, I convert that to a uint32 file offsets array (relative to the end of the header, not that that’s too important). Given each compressed frame/packet is a fixed length in terms of time, e.g. 60ms, it means for a given playback time we can easily find the compressed data offset for the packet, and the sample offset within that packet’s resulting decoded audio samples.

For debug purposes, a binary search in that table allows you to convert a compressed data offset back to a packet index and playback time.
Obviously needed to update UE_AUDIO_OPUS_VER (which conveniently results in all audio assets being re-cooked), update FAudioFormatOpus::Cook() to populate this new table, plus update SerializeHeaderData()
Obviously needed to update FOpusAudioInfo::ParseHeader() to deal with the new header, and convert the packet sizes to an array of offsets.
Probably worth updating GetMinimumSizeForInitialChunk(), to include the size of this table in the returned offset (reading the packet count from the header along the way, using the provided SrcBuffer)
Probably worth updating SplitDataForStreaming()

The main trick to seeking with Opus is that, when doing so, you should reset the state of the decoder (call opus_multistream_decoder_ctl with OPUS_RESET_STATE) and then decode the whole of the preceding Opus packet (if there is one) so as to prime the state of the decoder. We are then ready to decode the actual target packet/frame.

Added FOpusAudioInfo::SeekToTime override that converts the desired playback time to the Opus packet index and a sample offset, relative to the start of the samples that’ll result when decoding that packet. Store that in some member vars i.e. a seek request.
Updated FOpusAudioInfo::GetFrameSize() to check for that seek request, and update SrcBufferOffset to refer to the packet (or rather than uint16 that was already stored before each packet) offset in the file. I also added some member vars to track/move this from being a seek request to being an in-progress seek (in part to confirm that GetFrameSize() is called before Decode()). As mentioned before, note that the packet we’re referring to here is the one preceding (if there is one) the packet that contains the actual seek target audio.
Updated FOpusAudioInfo::Decode() to check for the in-progress seek. If so, make the opus_multistream_decoder_ctl call with OPUS_RESET_STATE. If there’s a preceding packet, decode it and discard, and update SrcBufferData and the pointer to the next source compressed data/packet. After decoding the actual packet of interest, we may then need to skip over some samples - which could be done by shuffling up the generated data in its buffer, or passing out a pointer to the start of the usable data. Obviously NumAudioFramesProduced needs to be updated too, based on the skipped sample count.