This stuff is rather over my head… But looking at it from a different angle: is there a way we could schedule/queue precisely timed audio events from the game thread? For example, I used Unity’s AudioSource.PlayScheduled to make absolutely precision-timed piano tones queue ahead of time to coincide exactly with the delay of previously played tones in this unfinished game:
I’d read somewhere (sorry I can’t find references atm) that the brain will forgive/overlook slight mistimings of visual information, while audio mistimings will be much more jarring. So even though player movement in the above example would be unpredictable and render frames would often not fall precisely on a musical beat (especially on mobile), I could time the movement animation to “land” on its final step roughly on a beat, but more importantly, I could queue the “landing” tone to play exactly on the beat closest to when the upcoming landing would occur.
Sorry, I don’t know if this would solve OP’s problem, but might be a way to work around latency of game thread yet maintain precision timing of audio. It’d be a tradeoff of reduced responsiveness to player input, since you’ll be scheduling whatever amount of frames/milliseconds ahead of time you’d need to avoid “dropping” a beat (haha) if the game thread lags more than e.g. 1/16th note or whatever granularity your sequencer might use.
Would love to see something like the above PlayScheduled function from Unity, it worked great for my case