Hi! This is cool stuff i’m going to write here a bit about. I’ve been managed to put together a source effect DSP that is being used to change the pitch of an audio that is played back. This live DSP effect is based on the new engine feature of the unreal engine, called the Audio Mixer. You can read about in the topic below in order to set up some cool stuff with it:
The Pitch Shift DSP will be a new stuff and you can’t find it in the engine just yet, however i’m planning to make a pull request of it to the unreal engine master github repository so you can all have this extension in your game engine to use it for whatever reasons you just wish!
Here is a short video i put together where you can see this stuff in action! Hope you will enjoy this demo as much as i did making the extension
But of course, we can already change the pitch of the playback, so what this DSP is actually good for?! This is the right question and i’m going to answer it in a moment. Generally speaking when you change the pitch of an audio sample played back, there is the speed of the playback which is also going change with it, so the higher the pitch the faster the playback speed will be, and if you lower the pitch it all gets slower. That’s not always work out very well, especially if you wish to keep the playback in sync with your other materials (eg. video media, subtitles, rhythm of the music etc). Alternatively you can double the audio frames to keep the audio length intact, but the sampling will be invalid so it will cause distortions and unwanted side effects.
Here comes in the equation the Pitch Shifting DSP, which will directly change the pitch of a Fourier transformed signal in the frequency domain while retaining the original audio duration. Yes! It is a very similar approach to the Phase Vocoders, and it allows you to preserve the duration the audio clip while changing it’s pitch, so this makes it suitable to be placed in the DSP chain and it won’t cause desyncing with you other media contents. Not much desync, actually! The tiny problem with this is, that it require a small buffer under the hood in order to gather and process the audio data, therefore the output will delay a bit. But it’s a constant delay, determined by the size of the FFT window you will set up, so you can adjust your audio settings against this small inconvinience.
I’m sure those who had previous experiences with digital audio softwares (eg fruity loops, cubase, studio one, etc) and VST’s are very well familiar with audio latency which sometimes comes as the result of the many choosen effects on the insert chains of the project, which they always require some time to produce the audio output. This you can calculate and adjust your DSP chain timings against to keep the sync in your final mix. Usually DAWs Digital audio workstation - Wikipedia does this a way, it will delay the entire master mix to allow the individual effects to produce the sound in their own little time domains, so the end result will be in sync at the output.
Here in unreal engine what you can perhaps do is to start the playback of the pitch shifted sample a tiny bit earlier (earlier than your other media contents), and this will help you to keep the sync with everything else very well for any period of time, since the playback speed will won’t actually change ever.
Pitch Shift DSP
In the following i’ll show you the actual usage of the Pitch Shift DSP as well. It is fortunate that the authors of the new Audio Mixer, @dan.reynolds and @Minus_Kelvin have put together a very nice article that will help you to set up DSP effects in unreal engine, so i’m just gonna link their article and you can learn more about the general setup of source effects straight from them!
Once you have finished with their brilliant tutorials, you can go to the effect’s panel where the parameters of pitch shift can be adjusted. There will be three parameters available and they change the frame size, oversampling and pitch respectively.
The pitch is very obvious what it does, but it’s characteristic is important to mention here. The value range of the pitch parameter will be between 0.5 (50%) and 2.0 (200%) that is a 2 octaves range to adjust the pitch. 0.5 is the low (eg C0) whereas 2.0 will be the high (C2) when your normal playback is (1.0 / C1) pitch.
The frame size will determine the FFT window’s size the algorithm will use. It is limited to be between 128 and 8192. The smaller the window, less the audio latency will be on the output, BUT it will have a huge impact (degradation) on the quality of the end result as well so it becomes worse. In my experiences the 1024 is a rather good window for the in-between with quality and latency, but you can go much higher for the best quality as well. The value must be a power of 2, but the algorithm will keep you on the safe side and any value you put in there the closest power of 2 will be actually used. So don’t worry you won’t cause glitches nor any crazy noises with your uneven values, it’s all handled internally.
The oversample is the STFT Short-time Fourier transform - Wikipedia oversampling factor where a number of 4 should give you a rather natural voice without any quality loss. You can go higher with this value, but i clamped the value to 32 that’s the maximum you can use. It’s possible to change this to allow higher values as well, tho i don’t really see the point of that.
You can use this DPS with any sound source in the game engine, this is not tied to a VOIP solution by any way! It just for the fun i set up our custom networked VOIP solution here to show you it actually works with a live voice input as well without any troubles. The VOIP solution being used here is a project we are working on, in and out for the wast majority of this year.
This prototype voip solution will capture and transmit the OPUS encoded voice packets over the network (using simple value replication on the actor channel) to the receiving end, where they are gathered and after some network stability adjustments (will cause some latency) it finally gets played back. Usually the latency of the VOIP in this prototype will add up in a continous manner so the longer you talk more the lantecy will be, however this helps actually rather well to keep the voice continous and stable! This and many cool features of a VOIP will be packed into a plugin (something we plan to name as Pro Audio Capture) which my Colleague and I are going to share with you all guys, most likely as a marketplace item at some point in the future, so you can all have it for your games and other uses as well!
While i gave the name Pitch Shift DSP to this effect, it actually does not cover many other uses you can possibly will find. For example, if you can set up and use a MIDI input in unreal engine, you can have a very cool choir effect by using it in monotone or polyphony by applying multiple instances of it. Whether its a human voice or any other synthesized sound, the smoothing characteristic of the output can also be used as a rather unusual filter for your fine atmospheric sounds and melodies. But i’ll let you find your use cases, and we’re very interested to hear about your cool ideas!
Here i put you some links so you can sail on the webs and learn more about pitch shifting, vocoders and DSPs in general. These are cool stuff and many things to learn about!
Once again, this new DSP effect is planned to be available at some point in the github repository of unreal engine on the master branch and hopefully will be pulled to the actual engine code, so if you find this stuff useful you can maybe help us to make this happen and just put your votes on the PR page. Thanks a lot for your support!
I’ll keep this post updated, and will give you links to the forthcoming pull request and the audio capture + voip stuff so you can keep yourself informed regarding these matters.