New Audio Engine Audio Capture frequencies and note recognition

anonymous_user_1256e9bc · September 27, 2018, 2:57am

Hello, we are working on a student VR project which is due in 2 months.
The main mechanic of the project is voice manipulation and visualisation.
The new Audio Engine seems great but with lack of documentation and zero experience in sound programming it’s unclear how to achieve what we need,
so if you could point us into the right direction it would be great!

The first thing we would like to do is to control real time parameters of a particle system based on players voice coming from VR mic (its color is based on musical notes and its shape is based on different frequencies amplitudes).
As I can see Audio Capture provides interpolated envelop value which seems not enough for what we are trying to achieve as far as I understand.
The question is how do we get the frequencies from Audio Capture and how do we calculate which note was it?
We want to pre-record the voice, apply sound fx to it and then visualise it using the logic from step (1).
As I can see from this demo: GDC2018 - Unreal Audio Engine - Audio Capture and Submix Recording - YouTube
the recording and fx part should be possible but I could not find documentation or demo project on how to do it.
Could anyone please provide a link or something that shows how to do it please? Or tell us which component to use to record the sound and how to apply fx to it.

Another question would be similar to 1): how can we get the frequencies and calculate which note were played from the processed sound?

So if to summaries we are after some kind of GetNote and GetFrequenciesByRange nodes regardless from which source the sound is coming from live or pre-recorded.
And also a documentation about how to record and manipulate the sound using new Audio Engine.

Your help is greatly appreciated!!!

anonymous_user_1256e9bc · September 29, 2018, 2:56am

Hello. It took a bit a while for moderators to approve this post so we’ve managed to do a little progress since.
By playing with the new audio engine we’ve figured it out that you can use Submix recording to record your voice and save it as a sound wave asset.
And we can do it pretty much every tick getting real-time audio feed in sound wave format.
Now using Sound Visualisation plugin we are able to calculate the frequency spectrum from both ‘live’ or pre-recorded voice to drive particles shape.

So the only thing left to figure out is how to get the musical note from the spectrum or the sound wave.
We’ve also noticed that the new Audio Engine has GetMIDIPitchFromFrequency function which gives us MIDI note from a frequency,
which means that what we actually need to find is a frequency in Hz.

As far as we understand the sound wave gives us multiple frequencies combined in a single wave.
The questions are:

How do we know which one we should use to get the pitch from?
Is it the loudest one? Or is there some averaging formula exists?
How do they recognize pitch in games like Guitar Hero or similar karaoke apps?
And how do we get the frequencies in Hz rather their amplitudes in dB in UE4?

Thanks in advance!

anonymous_user_1256e9bc · October 4, 2018, 6:59am

As nobody replied I’ll just post an update of what we’ve end up doing for now if it helps to anybody.

We’ve found a simple algorithm of finding fundamental pitch described here: http://www.kaappine.fi/tutorials/fundamental-frequencies-and-detecting-notes/
But to get this working you need FFT output of at least 8192 samples to get rough approximation of the pitch.
There are couple of issues with this. The array is quite big to go through real time and will affect the frame rate significantly so FFT should be able to work within provided frequency range to perform binary search. The second one is that Sound Visualisation plugin CalculateFrequencySpectrum (not sure which FFT they are using there) is limited to 512 samples which is not nearly enough to find the pitch.

We could not find anything in unreal that would let us to implement it. So we end up using Pure Data (PD) and OSC plugin for live pitch recognition.

It would be awesome if new Audio Engine provides more data for audio visualisation in the future!!! Best regards!

Sir_Elderoy · October 10, 2023, 4:44pm

The site is down. Is here a copy from the internet archive so it doesn’t fall into limbo:
(link: Fundamental frequencies and detecting notes)

Fundamental frequencies and detecting notes

TeppoK January 3, 2013 Tutorials 7 Comments

After new year festivities it’s time to get back to my series of Unity3D tutorials. This time, I’ll show you how to extract the fundamental, or strongest, frequency in a mixed-signal input such as coming from a microphone into Unity3D. Then we’ll look into how you can compare them to notes from a bass or any other instrument.

Do the Fast Fourier Transform

As mentioned in a previous tutorial, we can utilize Fast Fourier Transform (FFT) to get the frequency data out of a signal. When using Unity3D we don’t have to implement our own FFT function since Unity3D provides us with GetSpectrumData function. To use this function, you pass it a float array with a size that’s power of two (ie. 128, 256, 512) with a minimum of 64 and maximum of 8192 along with a channel to extract data from and a possible window function to increase precision. Now, if we take the MicrophoneInput -script from my previous tutorial and start to build on that, we’ll add a new function called GetFundamentalFrequency, where we first grab the spectrum data to an array. I’ve also defined a variable for the fundamental frequency we are going to calculate later on.

float GetFundamentalFrequency() { float fundamentalFrequency = 0.0f; float data = new float[8192]; audio.GetSpectrumData(data,0,FFTWindow.BlackmanHarris); return fundamentalFrequency; }

Find the bin

Now, we are not really calculating the exact frequency that is strongest in the signal, but we are going to find out the FFT bin that has the strongest signal. We do that by iterating through the data and keeping track of the signal level in the loudest bin. We do that by using a simple loop and a couple of temporary variables, s will keep the strength of the strongest signal and i will keep the index of the bin where that signal was found.

float s = 0.0f; int i = 0; for (int j = 1; j < 8192; j++) { if ( s < data[j] ) { f = data[j]; i = j; } }

Calculate the frequency

In order to get the frequency, we have to do some maths. Since the precision of FFT depends also on our sample rate, we must take this into account. Earlier, I wrote a post about the FFT and it’s precision so you might want to check that out too in order to get the details. But the formula we are using to calculate the frequency of the bin we found that was the strongest, is as follows:

frequency = binIndex * samplerate / bins

As you can see, the precision is dependent on the sample rate and the number of bins (size of array) used in the FFT. After adding that equation to the function, it looks like this.

float GetFundamentalFrequency() { float fundamentalFrequency = 0.0f; float data = new float[8192]; audio.GetSpectrumData(data,0,FFTWindow.BlackmanHarris); float s = 0.0f; int i = 0; for (int j = 1; j < 8192; j++) { if ( s < data[j] ) { s = data[j]; i = j; } } fundamentalFrequency = i * samplerate / 8192; return fundamentalFrequency; }

Putting it together

Now we have a function that provides us with a frequency that is strongest in the signal fed in by our microphone. To combine this properly to the script, we should add a global variable for the sample rate and for the frequency we found so we can access it from other scripts. With these changes, the full MicrophoneInput script should be something like this:

using UnityEngine; using System.Collections; [RequireComponent(typeof(AudioSource))] public class MicrophoneInput : MonoBehaviour { public float sensitivity = 100.0f; public float loudness = 0.0f; public float frequency = 0.0f; public int samplerate = 11024; void Start() { audio.clip = Microphone.Start(null, true, 10, samplerate); audio.loop = true; // Set the AudioClip to loop audio.mute = true; // Mute the sound, we don’t want the player to hear it while (!(Microphone.GetPosition(AudioInputDevice) > 0)){} // Wait until the recording has started audio.Play(); // Play the audio source! } void Update(){ loudness = GetAveragedVolume() * sensitivity; frequency = GetFundamentalFrequency(); } float GetAveragedVolume() { float data = new float[256]; float a = 0; audio.GetOutputData(data,0); foreach(float s in data) { a += Mathf.Abs(s); } return a/256; } float GetFundamentalFrequency() { float fundamentalFrequency = 0.0f; float data = new float[8192]; audio.GetSpectrumData(data,0,FFTWindow.BlackmanHarris); float s = 0.0f; int i = 0; for (int j = 1; j < 8192; j++) { if ( s < data[j] ) { s = data[j]; i = j; } } fundamentalFrequency = i * samplerate / 8192; return fundamentalFrequency; } }

Now lets figure out what note that is…

Ok, we have the strongest frequency now. If you want to convert that to a note, you need to know the fundamental frequency of that note and compare it to the frequency given by our function. Let’s say we want to know if the note being played is C4, or “middle-C”. If we assume that A4 is 440Hz, as it usually is with normal tuning, C4 is 261.63Hz. Now all you need to do, is make a simple comparison between that and the frequency you get from the script above. Lets make that into a script, I’ll call it NoteFinder for now and make it display the note in a GUIText component if it is found. The beginning of the script is pretty much the same as the SpawnByLoudness -script from previous post, except for the inclusion of requirement for GUIText component.

using UnityEngine; using System.Collections; [RequireComponent(typeof(GUIText))] // Require GUIText component so we can display a text public class NoteFinder : MonoBehaviour { public GameObject audioInputObject; public float threshold = 1.0f; MicrophoneInput micIn; // Use this for initialization void Start () { if (audioInputObject == null) audioInputObject = GameObject.Find(“MicMonitor”); micIn = (MicrophoneInput) audioInputObject.GetComponent(“MicrophoneInput”); } // Update is called once per frame void Update () { int f = (int)micIn.frequency; // Get the frequency from our MicrophoneInput script if (f >= 261 && f <= 262) // Compare the frequency to known value, take possible rounding error in to account { this.guiText.text=“Middle-C played!”; } else { this.guiText.text=“Play another note…”; } } }

That’s all folks, or is it?

You should now have a system that can detect frequencies and notes for you. You can go ahead and implement different versions of the frequency detection, like divide it to a frequency bands and use their combined loudness value to trigger events in your game. If you however want to detect more notes, you could refer to a table like http://www.phy.mtu.edu/~suits/notefreqs.html for more frequency values.

There is a couple of considerations though. Make sure you select an appropriate sample rate for your implementation. For example, since I want to have a good resolution in the low frequencies, I use sample rate of 11025 and FFT size of 8192. this gives me a bit over 1Hz resolution up to around 5000Hz. Then there is a way to speed up the frequency calculation. Since FFT, by nature, does not give any meaningful information with a real-world signal over the Nyquist frequency, we can ignore the upper half of the bins. So when using 8192 bins, we need to iterate only to 4096 bins, which speeds up the GetFundamentalFrequency loop quite a bit.