Stop using private variables in virtual functions please!

StrangerGwenn · September 1, 2016, 5:55am

Minus_Kelvin, thank you so much for taking the time to explain all of this to us.

I am using PIE, indeed. I just switched to your method, I can confirm it works now. Thnak you !

I’m pretty sure my previous implementation worked in PIE on 4.12, btw, but I may be wrong.

kamrann · September 1, 2016, 9:30am

Minus_Kelvin;586603:

The amount to wait until feeding audio out is configurable by NumSamplesToGeneratePerCallback. This also determines the general cost of the procedural sound wave. Larger NumSamplesToGeneratePerCallback will reduce CPU cost but increase latency (for real-time synthesis that gets param data from the game thread, this will mean that your synthesizer will respond more slowly to parameters, etc). Also, the larger the NumSamplesToGeneratePerCallback, the fewer OnSoundWaveProceduralUnderflow delegate callbacks will be made per GeneratePCMData callback.

Also, the amount of silent audio to write out in the case of real buffer underrun is also configurable with NumBufferUnderrunSamples. This is to decouple the amount of silent audio written out from the number of samples we normally generate per callback (i.e. you may want to reduce the size of the buffer underrun and ensure that the xaudio2 voice performs an OnBufferEnd callback faster or shorter for silent buffers than for audio-filled buffers).

I’d appreciate it if you could clarify this behavior a little. The last sentence of the first paragraph I’ve quoted doesn’t seem to match with the code on master branch, which only ever calls the delegate once at most, and will simply never return a full buffer (as defined by the SamplesNeeded parameter). Incidentally, you’ve ended up with a redundant Min call on line 67 (USoundWaveProcedural::GeneratePCMData) - due to the above comparison, the min will always be SamplesToGenerate.

The reason I ask is because the procedural sound system I’ve been working on would ideally provide samples with a lower latency, but I was convinced that there was no way around the engine requesting 8192 frames at a time when calling GeneratePCMData. As I understood it, returning any less would result in silence being padded, but looking at your new code I’m not so sure anymore.

Minus_Kelvin · September 1, 2016, 6:13pm

Not sure what you mean that it will only ever call the delegate once? As long as the source has not been stopped (due to garbage collection or whatever), the platform voice (e.g. IXAudio2SourceVoice) will continue to generate OnBufferEnd callbacks, which will call the procedural sound wave delegate function when it needs more audio. The internal buffer of 8k samples is not really the required number of samples that you need to generate. Also you can call QueueAudio with whatever sized audio as you want. The SamplesToGenerate value is simply the amount of audio that the GeneratePCMData will attempt to return from the amount of audio available in the AudioBuffer.

It will submit the generated source samples to the source voice though. If the number of samples you generated is small, then the OnBufferEnd callback will again occur more quickly, which will result in your delegate function getting called more frequency. Thus you get the fundamental paradigm in audio programming: decrease buffer size for decrease latency (i.e. ability to respond to input/control params, etc) but increased CPU cost.

Trace through the code that creates and sets up procedural sound wave’s and you’ll see what I mean. Note the existance of the FXAudio2SoundSourceCallback object and how we use FXAudio2SoundSourceCallback::OnBufferEnd() to actually handle both feeding more decoded audio to the voice (in the case of real-time decoded buffers) or more user-created procedural buffers.

Note that “SamplesNeeded” argument is in USoundWaveProcedural::GeneratePCMData not properly labeled. I need to change it to “MaxSamplesToGenerate” or something.

No it’s not redundant. Keep in mind that it’s possible for other systems to call QueueAudio not from the delegate callback. The delegate underflow callback mechanism is entirely optional (which is why we checked if its bound before calling!).

For example, a VOIP-type system (or other streamed audio system) might enqueue audio via separate notification mechanism or separate thread. Those types of systems will probably queue audio to the procedural voice as soon as it is available. You don’t want to wait for the source audio to make callbacks before you do that. You could do that but then you’d have to have a separate mechanism to store audio then feed that to your delegate on underflow. The point though is that in those cases the SamplesAvailable may be way larger than the SamplesToGenerate value. Thus, we need to set SamplesToCopy as the min of whats available and what we want to generate per-callback.

That said, it could be argued that in that use-case why even have a separate value than SamplesNeeded (which is max output buffer size). I felt like it was a valuable thing to allow the user to specify allow them to specify a buffer size smaller than the max size to have better control over their latency. 8k frames is actually a noticeable amount of time. Which brings up the fact that the SamplesNeeded value (in GeneratePCMData function) is computed as:

const uint32 BufferSize = MONO_PCM_BUFFER_SIZE * Buffer->NumChannels;

MONO_PCM_BUFFER_SIZE is 8192 (and should really be called PCM_FRAME_BUFFER_SIZE).

And the reason why there’s a max size at all is that this raw buffer in a triple buffer type system that can’t be reallocated once created since XAudio2 uses the buffer directly. We could probably redesign the system to allow for the user to give whatever audio but it would require even more copying of audio buffers than we currently are.

Yes, this is exactly why I decoupled “SamplesNeeded” from “SamplesToGeneratePerCallback”. The USoundWaveProcedural::GeneratePCMData function returns the amount of bytes the callback generated! This is used to tell the XAudio2 voice how big the audio buffer we’re submitting to it is.

HOWEVER, I do return zero’d buffers in the case where the number of samples available is less than the number of samples specified in the NumSamplesToGeneratePerCallback value (or SamplesNeeded if that is smaller). This is specifically useful for VOIP-type systems which may have started playing the procedural sound wave but there isn’t much audio yet available to play. If we didn’t return any audio, then no further GeneratePCMData callbacks would be made and the procedural sound wave would mysteriously fall silent. This actually happened in 4.12 for VOIP systems and is why I refactored this code a bit.

Note that I debated about only doing this zero-buffer return for the beginning of a procedural sound wave instead of whenever the samples available is less than the SamplesToGenerate function. For example, maybe you want to only return zero-buffers in the beginning while the audio is filling up (due to VOIP decoding, etc). Then once it BEGINS playing audio, it always returns audio it has enqueued even if its not what we wanted to generated per callback. The reason I decided against this design is that if this is going on (essentially the amount of audio we’re enqueuing is slower than we are dequeuing), it’ll result in a viscious feedback and likely cause performance issues. For example, lets say we want to generate 512 frames per callback but one frame we only have 256 frames of audio (maybe the VOIP system slowed down or there was a slow-down somewhere in that system). If we submitted that 256 frames of audio, then the on-buffer-end callback would be made EVEN faster (2x faster in this example). Whatever was slowing down buffer enqueuing would probably still not be resolved and maybe there’s even less audio available, but we enqueue that anyway. You can see that quickly the audio callbacks get made faster and faster with less and less audio until it’s just single-samples… Each time causing more and more overhead with the audio device, etc.

So instead if not enough audio is available (less than SamplesToGenerate), it will write out a zero-d buffer. Note that I DID decide to decouple the zero-d buffer size from the NumSamplesToGeneratePerCallback. This is because you might want to have a pretty large number for NumSamplesToGeneratePerCallback but have a much smaller buffer size for the case when you ran out of samples. This would allow a VOIP type or streaming audio system to control the length of silent buffers to generate and allow them to recover faster.

In general, this code is still WIP and I have more ideas on how to make it more robust, user-friendly, and optimal (there’s a bunch of ways I can think of to reduce buffer copying/allocation). But the way it currently is is working as designed as far as I am aware.

kamrann · September 1, 2016, 8:16pm

Just that there is no looping/multiple calling of the delegate within the GeneratePCMData function itself, which is how it sounded to me from the way you phrased it, but anyway just a misunderstanding I think.

Okay this is great. Not sure to what degree this is new behaviour, or if I somehow just convinced myself it didn’t work this way when in fact it always did. Anyway, doesn’t really matter now.

Yep, the guy I’m writing the plugin for made this point, which is why we were surprised that (so far as we thought, apparently wrongly) it couldn’t be reduced. I agree some renaming of a few parameters could help to improve understanding of the code.

As for the redundancy comment, I was referring just to these lines. Unless I’m losing the plot, line 67 will always just evaluate to SamplesToGenerate; it’s not possible for SamplesAvailable to be smaller otherwise the if clause wouldn’t have been entered. I’m guessing you added the if statement later, rendering the FMath::Min redundant when previously it was needed.

Finally - any rough ETA on the new backend?

anonymous_user_b93b52b7 · September 1, 2016, 10:20pm

@Minus_Kelvin This is a minor issue, but I did what you suggested above and put in a MaxSamples count for generation. When it stops playing, there is an audible ‘click’ at the end of the stream. My code (borrowing your sine wave :P) below:



void UExploreSoundWaveData::GenerateAudioData(USoundWaveProcedural * InWave, int32 SamplesNeeded) {
if (SamplesRemaining > 0) {
		const int32 QueuedSamples = GetAvailableAudioByteCount() / sizeof(int16);
		int32 SamplesRequired = SamplesNeeded - QueuedSamples;
		if (SamplesRequired > SamplesRemaining) {
			SamplesRequired = SamplesRemaining;
			SamplesRemaining = 0;
		}

		SampleData.Reset(SamplesRequired);

		for (int32 i = 0; i < SamplesRequired; ++i)
		{
			float SampleValueFloat = SineOsc.NextSample();
			int16 SampleValue = (int16)(32767.0f * SampleValueFloat);
			SampleData.Add(SampleValue);
		}
		SamplesRemaining -= SamplesRequired;
		// Now call the audio queue to queue up some random data
		InWave->QueueAudio((uint8*)SampleData.GetData(), SampleData.Num() * sizeof(int16));
	}
}

I’ll be playing with the stream a lot more to see what I can do.

Minus_Kelvin · September 1, 2016, 11:24pm

Well yeah, you’ll have to handle enveloping fade-outs, etc. If that’s what you want. This is RAW PCM data generation, man.

You can literally do anything with audio. This is the entry point for real-time synthesis, physical modelling, granulation, literally anything. The only thing left is a similar mechanism for hand-coded DSP/effects processing.

Minus_Kelvin · September 1, 2016, 11:28pm

Yeah, pretty sure you missed my point. You may be correct that it is redundant for the case that a delegate is specified. However, other systems may be queuing audio totally independently so the available audio at that point in the code (for the second min) may be more than you want to give to the voice.

It’s probably never going to be “done”, but the first version (off by default and only implemented on a few platforms) is hopefully making it out in 4.14. I’m trying to wrap up some tasks so I can get back to integrating it up from a dev-stream to a stream which will make it to main. It’s currently tested and backwards compat for all non-DSP features in the current audio engine. I’ve got most of the DSP stuff worked out but haven’t yet finished it. September is slated for me to tackle the DSP stuff with new a reverb (actually probably multiple reverbs) and a EQ effect (written by yours truly) to replace our current platform-dependent effects.

anonymous_user_b93b52b7 · September 2, 2016, 1:27am

Do you have any recommended reading for learning how to do all that stuff? Scholarly papers, books, etc?

kamrann · September 2, 2016, 7:53pm

Great, looking forward to checking it out.

Now, this is getting confusing but just in case it’s a bug, and to preserve my sanity, one more go. I haven’t looked at the code in depth and don’t know enough about this stuff to follow everything you’ve said, but you have:



if(A >= B)
{
	X = Min(A, B);
}

Surely Min(A, B) is simply guaranteed to be B, since it’s already ascertained that B <= A?

Minus_Kelvin · September 2, 2016, 8:00pm

Ha… yeah. Thanks, sanity preserved. You are correct. It is redundant.

kamrann · September 7, 2016, 12:58pm

@Minus_Kelvin: Good to know
I have one quick semi-related question if you have a moment - is it guaranteed that whenever GeneratePCMData is called, the UAudioComponent that is playing the sound wave in question is still alive and active? Or is it possible that a call might still come through after the audio component has been unregistered/marked for destroy?

Reason I ask is I’m creating an async task within my sound wave object, which relies on some state the lifetime of which is managed by my custom audio component. I just want to be sure I’m not introducing any race conditions, but finding it hard to confirm from looking through the code.

Minus_Kelvin · September 7, 2016, 5:34pm

When an audio component is destroyed, it should stop its sound. This was actually triggering the stopping of procedural sounds for people earlier in the thread who were creating their audio component in such a way as to cause premature GC, and were confused when their sound was stopping.

anonymous_user_0fee4b2f1 · December 9, 2016, 9:35pm

First of all, great example and it almost works! Many thanks Minus_Kelvin, that example really helped me to integrate my audio needs… at least partially. I had this done previously using GeneratePCMData but latency was huge (>300ms), and the approach with QueueAudio is much more elegant and latency is acceptable (23ms).

However, there are some problems (UE 4.14):

When PlaySineWaveFrequency is started in BeginPlay event it works correctly in PIE, but in a standalone game GenerateData is called only few times and then calling stops. Buffers are still being played, but garbage/uninitialised data samples are heard. The MySoundWaveProcedural is still valid (checked using IsValidLowLevel method) and attached and plays, just the GenerateData is not called thus giving garbage sound output.

A crude hack is to fire PlaySineWaveFrequency slightly later after BeginPlay (~2 seconds was enough) using timers or TickComponent and then it is OK. It seems something is being GC’ed or cleaned up a while after BeginPlay event that stops GenerateData calls. And that happens only during standalone play. Is this correct behaviour, maybe I’m just doing something wrong?

I’m not able to make it working with stereo data buffers. Simple code changes for stereo gives strange clicks and it seems the GenerateData is not called often enough and buffer underruns.



// in constructor, two channels
NumChannels = 2;

// two oscilators
SineOsc1.SetFrequency(440.0f);
SineOsc2.SetFrequency(880.0f);

// then in GenerateData:

	const int32 QueuedSamples = GetAvailableAudioByteCount() / (sizeof(int16)*2);      // *2 for stereo

	const int32 SamplesNeeded = SamplesRequested - QueuedSamples;
	
	SampleData.Reset(SamplesNeeded);
	
	for (int32 i = 0; i < SamplesNeeded; ++i)
	{
		// left channel
		float SampleValueFloat = SineOsc1.NextSample();
		int16 SampleValue = (int16)(32767.0f * SampleValueFloat);
		SampleData.Add(SampleValue);
		
		// right channel
		SampleValueFloat = SineOsc2.NextSample();
		SampleValue = (int16)(32767.0f * SampleValueFloat);
		SampleData.Add(SampleValue);

	}
	
	// Now call the audio queue to queue up some data
	InProceduralWave->QueueAudio((uint8*)SampleData.GetData(), SamplesNeeded * (sizeof(int16)*2));    // note change, *2 for stereo

What I’m doing wrong here?

And finally, I have one serious comment. Why in the hell this topic has completely mismatched topic? I mean seriously, I spent long time to dig this out and I’ve found this thread just by luck! First of all this example with SineOsc should be included in the documentation…

Thanks

anonymous_user_0fee4b2f1 · December 10, 2016, 12:30pm

First of all, great example and it almost works! Many thanks Minus_Kelvin, that example really helped me to integrate my audio needs… at least partially. I had this done previously using GeneratePCMData but latency was huge (>300ms), and the approach with QueueAudio is much more elegant and latency is acceptable (23ms).

However, there are some problems (UE 4.14):

When PlaySineWaveFrequency is started in BeginPlay event it works correctly in PIE, but in a standalone game GenerateData is called only few times and then calling stops. Buffers are still being played, but garbage/uninitialised data samples are heard. The MySoundWaveProcedural is still valid (checked using IsValidLowLevel method) and attached and plays, just the GenerateData is not called thus giving garbage sound output.

A crude hack is to fire PlaySineWaveFrequency slightly later after BeginPlay (~2 seconds was enough) using timers or TickComponent and then it is OK. It seems something is being GC’ed or cleaned up a while after BeginPlay event that stops GenerateData calls. And that happens only during standalone play.

I’m not able to make it working with stereo data buffers. Simple code changes for stereo gives strange clicks and it seems the GenerateData is not called often enough and buffer underruns.



// in constructor, two channels
NumChannels = 2;

// two oscilators
SineOsc1.SetFrequency(440.0f);
SineOsc2.SetFrequency(880.0f);

// then in GenerateData:

	const int32 QueuedSamples = GetAvailableAudioByteCount() / (sizeof(int16)*2);      // *2 for stereo

	const int32 SamplesNeeded = SamplesRequested - QueuedSamples;
	
	SampleData.Reset(SamplesNeeded);
	
	for (int32 i = 0; i < SamplesNeeded; ++i)
	{
		// left channel
		float SampleValueFloat = SineOsc1.NextSample();
		int16 SampleValue = (int16)(32767.0f * SampleValueFloat);
		SampleData.Add(SampleValue);
		
		// right channel
		SampleValueFloat = SineOsc2.NextSample();
		SampleValue = (int16)(32767.0f * SampleValueFloat);
		SampleData.Add(SampleValue);

	}
	
	// Now call the audio queue to queue up some random data
	InProceduralWave->QueueAudio((uint8*)SampleData.GetData(), SamplesNeeded * (sizeof(int16)*2));   // *2 for stereo

What I’m doing wrong here?

And finally, I have one serious comment. Why in the hell this topic has completely mismatched topic? I mean seriously, I spent long time to dig this out and I’ve found this just by luck! First of all this example with SineOsc should be included in the documentation!

Thanks