Current state of UE and reading precise playback time of a song

Hi, I’d like to ask about the current state of UE as far as synchronization with music goes and the various methods that can be used as of UE 5.4, with the primary goal being able to get a “precise” time of the currently playing song.

First, thanks to the thorough replies concerning this topic
FActiveSound PlaybackTime is wrong - #5 by kaitorched by @Minus_Kelvin
Time Lagging in Quartz by @MaxHayes

While it seems like Quartz might be suitable as far this topic goes, I’m not completely sure how necessary it is. As explained in one of the posts above, there’s two parts to Quartz - being able to play sounds so that they are synchronized to the beat (by potentially offsetting them), and being able to get regular in-sync events on beat. I’m only really interested in the second (though I’d somehow have to figure out how to interpolate the time between the beats, on the game thread, so that solution isn’t quite complete).

//

Here is what I’m really looking for - I’d like an accessible variable that would reflect the actual current playback time of a song. Being more specific, I think need the following requirements to be satisfied:

R1) This playback variable is actually reasonably well “in-sync” with the actual samples that are soon to come from my speakers.

From looking at the code, it seems like e.g. GetPlaybackPercent is not reliable for this, and this is mentioned in the linked post. But perhaps it’s changed since 2016. From my inspection, it doesn’t use any sort of audio-thread side time, and instead uses the non-dilated game time (from my inspection it uses the AudioDevice::DeviceDeltaTime variable, and it seems like this variable is generally updated using platform/app time of some sort, despite the name). So for e.g. a long song, or after a few hitches on the game thread, the playback percentage could go out of sync with the actual song time and stay that way.

R2) I’d like the latency between when this variable is set, and the time the corresponding samples actually play, to be consistent - meaning I can assume that the time between when this audio time variable is set, and the time the corresponding samples actually comes out of a speaker, will be not vary too much. It can be reasonably large (e.g. always 30ms), but it should not vary too much (e.g. sometimes being 2ms and then 20ms is problematic).

Note - I’m aware that these requirements are not necessarily sufficient for good, precise behavior if my plan is to then use this variable from the game thread. But I see that as something that cannot be be solved - It’s simply up to me to ensure a consistent and high enough frame rate, so that there’s minimal latency. Only non-obvious thing I can think of is to e.g. read the audio variable as late as possible in each tick. But honestly, since I assume there will be some sort of latency between the sound going out and the actual audio time, I’m not sure whether this is something to prioritize, assuming I have 90+fps.

//

Some questions I have about this which I’d really love an answer to, and please correct me in any part where I’m wrong:

1 .
As far as the first requirement R1 goes, it looks to me like the sequencer, if used with source-clock being set to “audio”, should do exactly what I’m looking for (at least on platforms where the AudioDevice::AudioClock variable that it uses internally is available). It looks like the variable is set pretty much everytime we send an actual game audio buffer to the actual platform-api call.

So from my understanding, if I don’t really need the feature of Quartz to let me play sounds in sync to beat, but just need a very precise estimate of the the time for the single song playing in the background, then the sequencer might be good enough for me? Or is there something that Quartz still does better here?

2.
One possible issue I can think of with the above approach using sequencer is that it’s not clear to me how often the game audio buffers are sent to the platform api. I assume that the platform api has its own audio buffer, and perhaps to avoid any sort of silence, it might be a good idea to send the the current game buffer a bit before we expect the previous one to actually be consumed, so the platform code doesn’t have to wait for it.

If the actual sending to the platform is fairly regular, happening at similar speed of the actual sample-rate that the platform will use to play the sound + we keep some data in advance, this shouldn’t be a problem. Alternatively we could have e.g. 3 buffers of 1024 samples, and always make sure that the platform has actually consumed the 1st one, before we send the 3rd one, etc.

But if this isn’t handled, this could be an issue, as we might get some irregularity - if for some reason we end up sending 3x1024 buffers to the platform within a span of 1ms - then the audio clock time for each of those will be pretty much the same - it will be set based on the time the buffer was sent (e.g. (T+0.0ms, T+0.5ms, T+1.0ms)), when in reality it will take a while a before the 3rd buffer will actually be played (e.g. assuming sample rate of 44.1kHz - T+0.0s, T+1024/44100s, T+2048/44100s).

3.
The other thing I’m not clear about, is mostly whether it’s guaranteed that when a game audio buffer of 1024 samples is sent to the platform, that it contains all of the next 1024 samples of the song that is playing (only exception being when the song is about to end, and there’s less than 1024 samples remaining). Meaning that we actually wait until the next 1024 samples of the song are decompressed, mixed into our final buffer, and only then do we actually send the fully filled 1024-length buffer to the platform-api call.

The alternative seems unlikely - the idea that we would not wait, only fill in the the first say 600 of 1024 total samples of the game-side audio buffer, and then we hastily send it seems like a bad idea, at least if we can afford to wait. But perhaps there’s a scenario where it could be useful for some reason.

For my purposes however, there would be a big synchronization issue with this approach - the total sample count that has been sent to the platform since we started playing a song (which AudioDevice::AudioClock corresponds to, I think), is now out of sync with the samples of the song that have actually been processed (in the above example, if this were to happen right at the start, our total played sample count after the first buffer is sent is 1024, while our song sample count is only at the 600).

4.
As far as the the R2 requirement, it seems to me like this is not something that can be affect by UE. It’s purely the domain of the platform/hardware, and whether they handle incoming audio buffers with a low, or at least consistent latency. So at best you can adjust some settings in the OS or something, but this isn’t really something that can be.

//

I’m sorry that this is such a long post, but hopefully it could be useful and clarify some things to other UE users who are interested in getting reasonably precise, low latency audio time in a reliable way, as this is often very important for certain types games.