Pitch-shift source effect (DSP) over the network (VOIP)

Hi! This is cool stuff i’m going to write here a bit about. I’ve been managed to put together a source effect DSP that is being used to change the pitch of an audio that is played back. This live DSP effect is based on the new engine feature of the unreal engine, called the Audio Mixer. You can read about in the topic below in order to set up some cool stuff with it:

https://forums.unrealengine.com/development-discussion/audio/116874-new-audio-engine-early-access-quick-start-guide

Availability

The Pitch Shift DSP will be a new stuff and you can’t find it in the engine just yet, however i’m planning to make a pull request of it to the unreal engine master github repository so you can all have this extension in your game engine to use it for whatever reasons you just wish!

Video

Here is a short video i put together where you can see this stuff in action! Hope you will enjoy this demo as much as i did making the extension :slight_smile:

About

But of course, we can already change the pitch of the playback, so what this DSP is actually good for?! This is the right question and i’m going to answer it in a moment. Generally speaking when you change the pitch of an audio sample played back, there is the speed of the playback which is also going change with it, so the higher the pitch the faster the playback speed will be, and if you lower the pitch it all gets slower. That’s not always work out very well, especially if you wish to keep the playback in sync with your other materials (eg. video media, subtitles, rhythm of the music etc). Alternatively you can double the audio frames to keep the audio length intact, but the sampling will be invalid so it will cause distortions and unwanted side effects.

Here comes in the equation the Pitch Shifting DSP, which will directly change the pitch of a Fourier transformed signal in the frequency domain while retaining the original audio duration. Yes! It is a very similar approach to the Phase Vocoders, and it allows you to preserve the duration the audio clip while changing it’s pitch, so this makes it suitable to be placed in the DSP chain and it won’t cause desyncing with you other media contents. Not much desync, actually! The tiny problem with this is, that it require a small buffer under the hood in order to gather and process the audio data, therefore the output will delay a bit. But it’s a constant delay, determined by the size of the FFT window you will set up, so you can adjust your audio settings against this small inconvinience.

I’m sure those who had previous experiences with digital audio softwares (eg fruity loops, cubase, studio one, etc) and VST’s are very well familiar with audio latency which sometimes comes as the result of the many choosen effects on the insert chains of the project, which they always require some time to produce the audio output. This you can calculate and adjust your DSP chain timings against to keep the sync in your final mix. Usually DAWs Digital audio workstation - Wikipedia does this a way, it will delay the entire master mix to allow the individual effects to produce the sound in their own little time domains, so the end result will be in sync at the output.

Here in unreal engine what you can perhaps do is to start the playback of the pitch shifted sample a tiny bit earlier (earlier than your other media contents), and this will help you to keep the sync with everything else very well for any period of time, since the playback speed will won’t actually change ever.

Pitch Shift DSP

In the following i’ll show you the actual usage of the Pitch Shift DSP as well. It is fortunate that the authors of the new Audio Mixer, @dan.reynolds and @Minus_Kelvin have put together a very nice article that will help you to set up DSP effects in unreal engine, so i’m just gonna link their article and you can learn more about the general setup of source effects straight from them!

https://forums.unrealengine.com/development-discussion/audio/116874-new-audio-engine-early-access-quick-start-guide?p=982368#post982368

Once you have finished with their brilliant tutorials, you can go to the effect’s panel where the parameters of pitch shift can be adjusted. There will be three parameters available and they change the frame size, oversampling and pitch respectively.

The pitch is very obvious what it does, but it’s characteristic is important to mention here. The value range of the pitch parameter will be between 0.5 (50%) and 2.0 (200%) that is a 2 octaves range to adjust the pitch. 0.5 is the low (eg C0) whereas 2.0 will be the high (C2) when your normal playback is (1.0 / C1) pitch.

The frame size will determine the FFT window’s size the algorithm will use. It is limited to be between 128 and 8192. The smaller the window, less the audio latency will be on the output, BUT it will have a huge impact (degradation) on the quality of the end result as well so it becomes worse. In my experiences the 1024 is a rather good window for the in-between with quality and latency, but you can go much higher for the best quality as well. The value must be a power of 2, but the algorithm will keep you on the safe side and any value you put in there the closest power of 2 will be actually used. So don’t worry you won’t cause glitches nor any crazy noises with your uneven values, it’s all handled internally.

The oversample is the STFT Short-time Fourier transform - Wikipedia oversampling factor where a number of 4 should give you a rather natural voice without any quality loss. You can go higher with this value, but i clamped the value to 32 that’s the maximum you can use. It’s possible to change this to allow higher values as well, tho i don’t really see the point of that.

VOIP Prototype

You can use this DPS with any sound source in the game engine, this is not tied to a VOIP solution by any way! It just for the fun i set up our custom networked VOIP solution here to show you it actually works with a live voice input as well without any troubles. The VOIP solution being used here is a project we are working on, in and out for the wast majority of this year.

This prototype voip solution will capture and transmit the OPUS encoded voice packets over the network (using simple value replication on the actor channel) to the receiving end, where they are gathered and after some network stability adjustments (will cause some latency) it finally gets played back. Usually the latency of the VOIP in this prototype will add up in a continous manner so the longer you talk more the lantecy will be, however this helps actually rather well to keep the voice continous and stable! This and many cool features of a VOIP will be packed into a plugin (something we plan to name as Pro Audio Capture) which my Colleague and I are going to share with you all guys, most likely as a marketplace item at some point in the future, so you can all have it for your games and other uses as well!

Other uses

While i gave the name Pitch Shift DSP to this effect, it actually does not cover many other uses you can possibly will find. For example, if you can set up and use a MIDI input in unreal engine, you can have a very cool choir effect by using it in monotone or polyphony by applying multiple instances of it. Whether its a human voice or any other synthesized sound, the smoothing characteristic of the output can also be used as a rather unusual filter for your fine atmospheric sounds and melodies. But i’ll let you find your use cases, and we’re very interested to hear about your cool ideas! :slight_smile:

Links

Here i put you some links so you can sail on the webs and learn more about pitch shifting, vocoders and DSPs in general. These are cool stuff and many things to learn about!

Final words

Once again, this new DSP effect is planned to be available at some point in the github repository of unreal engine on the master branch and hopefully will be pulled to the actual engine code, so if you find this stuff useful you can maybe help us to make this happen and just put your votes on the PR page. Thanks a lot for your support!

I’ll keep this post updated, and will give you links to the forthcoming pull request and the audio capture + voip stuff so you can keep yourself informed regarding these matters.

Cheers!
@Konflict and @spaceharry

This is awesome! Please do send us the code… FFT-based effects are on our list of things to get to. You beat us to it!

Hey guys, do you have a website or twitter feed for you? I’d like to tweet a link to this forum post :smiley:

This is SO AWESOME! :smiley:

doctorfantastic.gif

Thanks, that’s very nice of you!

Yes i’ll intent to send the code as soon as i feel it’s ready to. I try to improve on performance while not consuming too much memory, that’s my priority for now! It’s FFT based and it’s very performance intensive as you are well aware, but it could be worse i guess! Ran some tests on a I7 4790 and found that with default project settings on 48K i can run 16 pitch shift instances (parallel) with 1024 window and 4 oversample, without actually hitting the limits of the audio engine. So far so good, but it’s not final result. I just hope i can keep this level up.

No i didn’t mean to beat anybody, it’s just fun and i enjoy working on the bits and pieces of this code. Wish i could spend more time on this, but i’ll find my ways to keep myself focused on this project. If you have better, less performance consuming solutions to FFT’s i’m sure you will be the one who beats me on this :slight_smile:

Yes sir, i absolutely agree :slight_smile: It’s fun to work on and many possible uses ahead!

I just hope the implementation will hit the requirements of an acceptable pull request. It’s not an easy task, really gives me the scare :slight_smile:

One more thing worth to mention. The thing works the other way around as well. So, i can slow down the playback of the audio, and by using the Pitch Shift DSP adjust against the pitch, i got a time stretch fx as a result. :slight_smile:


Simple as that! It is fun :slight_smile:

MUSIC. NON-STOP.
MUSIC. NON-STOP. (This is amazing. Good job!!!)
MUSIC. NON-STOP.
MUSIC. NON-STOP.
MUSIC. NON-STOP. (help)
MUSIC. NON-STOP.

This is one of the nuttiest demo’s I’ve ever seen… In a good way.

I am so glad I stumbled across this. Awesome work Konflict. I’ll be keeping a watchful eye on this!

Sorry about that :slight_smile: Since then i’ve been managed to implement the KissFFT from the third party components in order to replace the built-in FFT. I can’t say for sure, but the results are rather similar, and work flawlessly. Once again thanks for the tip, i would have missed the option without notifying me about it’s availability!

Cool! It’s been a lot on my plate recently, hence the release of the plugin is postponed a bit. Nevertheless the development has continued to a certain direction, since i was extended the equations to provide some interesting options to adjust the behavior.

This incarnation of the pitch shifting DSP (gave this a name as AlienSpitch FX) is provides a great variety of features to change the characteristic of a human voice (not exclusively, but mainly designed for that).

It is currently using float curves to map the 0-22khz (value ranges on t is 1.0 to 10.0) frequency ranges, so it is possible to adjust the behavior of phase and pitch of the processing on certain frequency ranges individually. Along with that i also added a “frequency caching” (having difficulties to find a better name for this feature) method, which is a temporal effect that will extract frequency deltas between frames thus allows the manipulation of how much you want the individual frequencies to change over time. It’s easy to produce a monotone speaking, or simply by overdriving the pitch deltas will cause more articulated changes.

Now that i try to explain it may sound a bit complicated but it actually is not. :slight_smile: Very easy to use, altho i have yet to find soltuions to adjust the float curves from blueprints. It is possible from c++ and by using the asset which i demonstrated in the video.

Veeeery cool! Seems so fun to play with. Looking forward to the plugin.

Ps Let us know if you find out a neat way to edit curves in BPs!

Hey Konflict, how goes the progress? Haven’t heard anything in a few months. I’m working on a custom build of the engine, and while working on the todo/roadmap and getting into the audio section, this popped up in my memory! Hope all is well.

Awesome work, @Konflict!

Hi,
@konflikt … this is AWESOME!

I just posted a topic a few days ago about a project I’m trying to realize in our company Concrete Games, here : https://forums.unrealengine.com/deve…k-in-real-time

The “Pitch-shift source effect (DSP)” is exactly what I need! This is what I’ve been looking for days and days!

The last messages are from 2018, we are now in 2021: is this Pitch Shift DSP available somewhere? Is it somewhere in the current version 4.26 in a place I couldn’t find?

I’ve already been doing tests for several days now (several weeks soon) with Adobe Audition and UE4, in order to foresee all the strange sound problems related to pitch, but also with music mathematics in order to find the best frequency tuning to create harmonious musical chords from different types of human pitched voices, etc. (thanks again @ArthurBarthur for your advice)
All this to say that my project is serious and could potentially become an important part of a new game that we could develop in our society (If of course I manage to get my hands on this effect, and the live tests are convincing).

I reached a point where I was hesitating between giving up and to buy a Mac to try to use Max/MSP with UE4 without knowing if the bridges between these two softwares are complete (by the way I don’t know Max/MSP but when I have an idea in my head it’s hard to stop me, and learning a new software is the kind of crazy idea I’m getting into… the proof is that I learned UE4 well to make all our games by myself without knowledge in code)…

Hoping to get my hands on this Pitch Shift DSP, I wish you a very pleasant day!

Mat

Default windows audio is just not good enough for many realtime audio tasks.
The processing of sound takes time, from microphone to reach the processing of dsp code in unreal there is already some delay (on some default pc hardware it could be quite noticable). For the processing of sound in unreal takes another delay since the fft window we use, then another delay happens while soundcard will output the analog signal. It’s not ideal for real time voice processing. Instead look into discrete hardware specifically designed for this kind of task, it can ensure low latency for the i/o. One possible optimization is to go with small window, and implement both ASIO and kernel streaming support in unreal to have low latency audio for the users. ASIO comes with special hardware for the customer but it shouldn’t be a huge problem.
The rest of users can simply turn the volume down at home, so can’t hear the delayed sound. Nobody likes that echo, it causing actual stuttering in speech for many people (like me). :slight_smile:

The pitch shift dsp is not updated to 4.26 just yet. Will keep in mind, thanks for asking!

@dan.reynolds 's granular synth setup was a really great demo!

You can maybe try just injecting audio capture comp straight into the granular synth’s buffer. Tho it’ll probably click/pop but you may apply anti aliasing. By default these options available for cpp coders only.

Thank you for all these answers and ideas for solutions! It helps me a lot to see more clearly what I can do with Unreal’s current tools, and what problems I might encounter.

My priority is the feeling of live interaction, so to have the shortest possible delay (no delay, or at least no sensation of any delay would be the best).

So I have to turn either to hardware, or to Max/MSP or Pure Data or SuperCollider, or maybe to C++…
In the meantime, to minimize the risk, I will start perfecting tests with the Audio Analyser plugin I recently found on the Marketplace, and pre-recorded human voice samples. ^^
Thanks again! I’ll keep you posted if I have any questions, additional requests :stuck_out_tongue_winking_eye:

Btw,I hope that all the beautiful things you show here will one day be in the engine!
Long live UE! :smiley:

Hello, i am very interested in the method you applied to the voice chat (even with some latency…). Is there a chance to resurrect the old links for the dsp effect above (outdated as UE5 uses now Metasounds) ? Unfortunatly my project is by some reason 4.27 and i would love not change the version now…

Many thanx!

Any links to a download we can look at? Im looking for a way to Pitch realtime VOIP audio and currently can’t use MetaSounds because I use a c++ plugin for VOIP that make use of the “old” AudioComponents.

In the meatime… in the MetaSounds days, i guess we can use a setup like this:
Use the AudioCapture and pipe that data into an AudioBus: AB_Input. Then create a MetaSoundSource that takes AudioBusReader → DelayPitchShift → AudioBusWriter: AB_Output. Finally have an SourceBus that play from the AB_Output.