Voice Chat Spatialization

G_ROW · November 10, 2016, 4:06pm

Probably not.

Pseudocorpus · November 16, 2016, 9:30am

Unfortunately this doesn’t work for me.

Inside the SubmitRemoteVoiceData of my VoiceEngineSteam.cpp I inserted the code snippet after this:

if (QueuedData->AudioComponent != NULL)
	{
		USoundWaveProcedural* SoundStreaming = CastChecked<USoundWaveProcedural>(QueuedData->AudioComponent->Sound);
		if (SoundStreaming->GetAvailableAudioByteCount() == 0)
		{
			UE_LOG(LogVoiceDecode, Log, TEXT("VOIP audio component was starved!"));
		}
		SoundStreaming->QueueAudio(DecompressedVoiceBuffer.GetData(), BytesWritten);
	}

at the end of the function (before “return S_OK;”)

Then I have extended PlayerState.h:

//Get the audio component for the voice chat
    	virtual UAudioComponent* GetOverrideVoiceAudioComponent();

and PlayerState.cpp:

UAudioComponent* APlayerState::GetOverrideVoiceAudioComponent()
{
	return NULL;
}

I changed my custom Player State SomPlayerState.h with the following:

UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = Sounds)
    		UAudioComponent* VoiceAudioComponent;

    virtual UAudioComponent* GetOverrideVoiceAudioComponent() override;

and my SomPlayerState.cpp:

UAudioComponent* ASomPlayerState::GetOverrideVoiceAudioComponent()
{
	return VoiceAudioComponent;
}

For my SomCharacter.h I used:

UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = Sounds)
		UAudioComponent* VoiceAudioComponent;

G_ROW · November 16, 2016, 4:54pm

So line 318 in VoiceEngineSteam.cpp is:
QueuedData->AudioComponent = CreateVoiceAudioComponent(SteamUserPtr->GetVoiceOptimalSampleRate());

That is where I changed it to this:

QueuedData->AudioComponent = [&]()
		{
			UWorld* World = GetWorldForOnline(SteamSubsystem->GetInstanceName());
			if (World && World->GameState)
			{
				for (const auto& PlayerState : World->GameState->PlayerArray)
				{
					if (*PlayerState->UniqueId == RemoteTalkerId)
					{
						if (UAudioComponent* OverrideAudioComponent = PlayerState->GetOverrideVoiceAudioComponent())
						{
							if (USoundWaveProcedural* VoIPSound = Cast<USoundWaveProcedural>(OverrideAudioComponent->Sound))
							{
								VoIPSound->SampleRate = SteamUserPtr->GetVoiceOptimalSampleRate();
							}

							return OverrideAudioComponent;
						}
						else
						{
							break;
						}
					}
				}
			}

			return CreateVoiceAudioComponent(SteamUserPtr->GetVoiceOptimalSampleRate());
		}();

Pseudocorpus · November 17, 2016, 1:56am

That’s it! Awesome! It works now, though I made a slight change because the VOIP can be heard over the entire server when the players spawn at different locations:

bool bShouldPlayVOIP;

(.....)
    
    if (UAudioComponent* OverrideAudioComponent = PlayerState->GetOverrideVoiceAudioComponent())
    						{
    							if (USoundWaveProcedural* VoIPSound = Cast<USoundWaveProcedural>(OverrideAudioComponent->Sound))
    							{
    								VoIPSound->SampleRate = SteamUserPtr->GetVoiceOptimalSampleRate();
    							}
    
    							bShouldPlayVOIP = true;
    							return OverrideAudioComponent;
    						}
    						else
    						{
    							break;
    						}
    					}
    				}
    			}
    
    			bShouldPlayVOIP = false;
    			return CreateVoiceAudioComponent(SteamUserPtr->GetVoiceOptimalSampleRate());
    
    		}();

    		if (QueuedData->AudioComponent && bShouldPlayVOIP)
    		{
    			QueuedData->AudioComponent->OnAudioFinishedNative.AddRaw(this, &FVoiceEngineSteam::OnAudioFinished);
    			QueuedData->AudioComponent->Play();
    		}
    	}
    
    	if (QueuedData->AudioComponent != NULL && bShouldPlayVOIP)
    	{
    		USoundWaveProcedural* SoundStreaming = CastChecked<USoundWaveProcedural>(QueuedData->AudioComponent->Sound);
    		if (SoundStreaming->GetAvailableAudioByteCount() == 0)
    		{
    			UE_LOG(LogVoiceDecode, Log, TEXT("VOIP audio component was starved!"));
    		}
    		SoundStreaming->QueueAudio(DecompressedVoiceBuffer.GetData(), BytesWritten);
    	}
    
    	return S_OK;
    }

Thank you very much for clarifying! Have a great day!

G_ROW · November 18, 2016, 10:06pm

Glad you got it working!

Pseudocorpus · November 27, 2016, 11:25pm

I cheered too soon. After extensive testing with one server (not dedicated) and one client it turnes out that this is not a solution.

With your original solution:
At game start the server and the client are half a kilometer away from each other. The server behaves as it should, the client can be heard loud and clear even at a far distance, which is not the desired behaviour. As soon as the server gets near the client the voice spatialization comes in and the voice first is silent and gets louder when getting closer. The client on the other hand keeps a constant high volume (no matter the distance).

With my alteration:
The Server behaves as it should. It is silent when far away and gets louder as it moves closer to the client. The client however is always silent/muted and can never be heard.

It seems like the AudioComponent can never be overridden for the Character of the client (due to the PlayerState, AudioComp or whatever…). Therefore a default AudioComponent is always being created (CreateVoiceAudioComponent) and makes the client speak at a constant/loud volume.

Kris · March 13, 2017, 11:37am

For those interested, I created a branch of 4.15 with the modifications I made for GB to feed the VOIP audio component to the player controller:

https://github.com/KrisRedbeard/UnrealEngine/tree/ModifyVoiceAudioComponent

While we haven’t setup our fantastically - it drops off a bit too much after a certain distance - you can modify the audio component like any other sound.
I pass the audio to the player state and modify it there based on radio usage & team.
I’ve included a copy of the function in case to give you an idea of how to update it:

void AGBPlayerState::UpdateVOIPAudioComponent()
{
	if (VOIPAudioComponent.IsValid())
	{
		UAudioComponent* AudioComp = VOIPAudioComponent.Get();

  		const bool bWasActive = AudioComp->IsActive();
  
  		if (bWasActive)
  		{
  			AudioComp->Stop();
  		}

		if (bIsSpectator || (bUsingRadio && bVOIPSameTeam))
		{
			AudioComp->SetUISound(true);
			AudioComp->bAllowSpatialization = false;
			AudioComp->bOverrideAttenuation = false;
		}
		else
		{
			if (AGBCharacter* GBCharacter = GetCharacter())
			{
				static FName NAME_HeadMask(TEXT("HEAD_Mask"));
				AudioComp->AttachToComponent(GBCharacter->GetMesh(), FAttachmentTransformRules::SnapToTargetIncludingScale, NAME_HeadMask);
				AudioComp->SetUISound(false);
				AudioComp->bAllowSpatialization = true;
				AudioComp->AdjustAttenuation(VoIPAttenuationSettings);
				// FIXME - The CreateVoiceAudioComponent() sets this to 1.5f by default. Keep that?
				// AudioComp->SetVolumeMultiplier(1.0f);
			}
		}

  		if (bWasActive)
  		{
  			AudioComp->Play();
  		}
	}
}

Minus_Kelvin · March 13, 2017, 6:36pm

Excellent, thanks Kris!

This sort of thing should be much much easier to accompish with the new synth component in the new audio engine. But in the meantime, looks like you’re doing great!

Pseudocorpus · March 13, 2017, 7:06pm

Thanks for sharing your approach, Kris!

The Synth Component sounds interesting, Minus_Kelvin. Will it be available in 4.16?

Minus_Kelvin · March 13, 2017, 9:08pm

Should watch my GDC presentation made a few weeks ago.

New audio engine will have it in 4.16, but not on by default, experimental mode still. Synth components basically make creating procedural audio trivial. wraps a procedural sound wave and an audio component, manages the format conversion, lifetime, and threading issues in an easy way.

Kris · March 13, 2017, 11:19pm

Look forward to seeing it, thank you.

anonymous_user_0113065a · April 23, 2017, 2:04pm

I have downloaded the GitHub 4.16 version and it has synths. can you please help me understand how to get audio from an audio input and emit it inside the game?

a screen capture will do

Minus_Kelvin · April 24, 2017, 6:10pm

This can’t be explained in a screen capture since the feature is not yet directly supported.

You’ll have to write C++ code to get mic capture buffers (e.g. using DirectX), then feed that to a synth component. Should be fairly straightforward if you know what you’re doing, but if you’re new to audio programming, might be a bit tricky.

There is a mic manager plugin which is being used for recording audio to sequencer which might help you get started if you know C++. Instead of feeding captured audio to a USoundWave for serialization, you’d feed it to a synth component for in-game playback. Note that you’ll not be able to reduce latency for realtime effects since the latency accumulation will definitely be larger than is tolerable for real-time processing (i.e. listening to your voice with your ears in reality, vs hearing it played back in-game with effects will not work). This is a common problem with any software recording of audio for realtime effects processing.

anonymous_user_01dd28a6 · August 2, 2017, 11:24pm

Update - This breaks in UE4.16

You now have to include #include "Engine/LocalPlayer.h" in VoiceEngineSteam.cpp

Also causes a PackagingResults:Error: Error Unknown Error

BTW - On unknown errors in packaging, you should look in the UE4 engine area, in my case it is in

K:\UE4_16_2\UnrealEngine-release\Engine\Programs\AutomationTool\Saved\Logs\UAT_Log.txt

Pseudocorpus · July 6, 2017, 8:06pm

Hey Kris,

I don’t know if you’re still reading this. I tried your approach, but the players can still be heard across the entire map.

I modified the VoiceEngine classes aswell as the basic player controller class as you submitted. Additionally, I added the function to my player controller class as follows:

void ASomPlayerController::ModifyVoiceAudioComponent(const FUniqueNetId& RemoteTalkerId, class UAudioComponent* AudioComponent)
{
	//Get Controlled Pawn
	ASomCharacter* SomChar = Cast<ASomCharacter>(GetPawn());

	//Pass on Audio Component to Voice Engine (Steam)
	AudioComponent = SomChar->VoiceAudioComponent;
}

Still, no spatialization is happening. Am I doing something wrong?

Attreyu · November 1, 2017, 3:40pm

Regarding the latency - what do you mean by “real-time” - are we talking about a couple of ms ? seconds ?

Unity has a number of 3rd party developed solutions for this - DFVoice or Photon Voice. There is no perceivable latency, but there are limits toward the maximum number of players and the amount of data/sec needed to keep latency to a minimum.

Still, unless you’re creating a MMO, you shouldn’t have any issues. Even then, I think regions of interest based solutions can be developed - ie. if you’re too far away from another player, the system shouldn’t even try to sync the output voices between the two of you.

BTW, everyone’s assuming we’re using the mic for voice input or recordings - but is it possible that in UE 4.16 onwards, using synths - we’d be able to use the line input from the sound card ? Like for a live virtual party, where you play the music at home from your turntables, as a DJ, and route it from the mixer to the PC which runs the “game” ?

Minus_Kelvin · November 1, 2017, 5:58pm

I am not quite sure what you’re asking. Also, if you’d like to open a new thread, that would be good. This is a HUGE thread that keeps getting necro’d.

Any mic capture via software will have latency. I don’t know what magic third party Unity plugins are doing to avoid this, but any DAW has this issue and is the primary motivator for the development of ASIO drivers. Even with ASIO, there is still going to be latency between the physical audio production and the playback on speakers. And I’m of course using the word “latency” to mean delay from event onset to render/output. We’re not talking about network latency so not sure why you mentioned MMOs.

For any audio engine rendering pipeline for a game, unless there’s special code to bypass the audio engine render and go direct to hardware (which is possible), there’s going to be the inherent latency of the audio rendering. I.e. the “block size” of the render device by the device sample rate. I.e. if you’re rendering 1024 frames of audio every “block”, and your device is rendering at 44,100 frames per second, the latency of “event onset” to hardware output is going to be at least 1024/44,100 = 23 ms. It will be greater than this due to inherent latency in other systems (the DAC, the OS, threading, etc).

So if you use the synth component to capture audio, render it out as a source, and let it go through the normal audio engine pipeline (which is good), you will have inherent latency. But you’ll also have the benefit of being able to treat the mic capture stream as any other source (and get spatialization, occlusion, effects, feedback/analysis into your game, etc). If you bypass the audio renderer altogether and feed mic capture directly to output, you’ll have to create your own independent DSP/processing system, or simply not get any benefit of using the audio renderer.

Minus_Kelvin · November 1, 2017, 6:05pm

However, if you’re using voice-capture for VoIP rendering, you ARE going to introduce network latency in addition to audio device latency. Also, most VOIP systems actually introduce intentional latency to optimize for stability. Network transmission can randomly delay individual packets at any point so to reduce random VoIP stream starvation due to minor network jitter, VOIP packets are buffered up and sent with an intentional delay. From my testing it’s around 50 ms to 250 ms of buffered audio (~ quarter second).

You can notice this network jitter delay if you use a service like Discord with two PCs next to each other on the same network using a Gigabit internet connection while also playing a networked MP game. In addition to the delay due to communicating with the voice chat server (and back), there will be a much bigger latency for the VOIP streams than there will be for game-events. I.e. to test the delta, have 2 characters perform the same action, see the network latency for the game event vs the latency of the audio from the VOIP.

To synchronize VOIP streams with other locally transmitted data which needs to sync with VoIP, you’ll want to time-stamp that local data (e.g. mo-cap data or gesture data or something) to the local-mic capture audio stream, and use the same jitter-delay that the VoIP stream uses. Then on the receiving client, make sure to unwrap the side-band data stream packets (e.g. trigger animations etc) using the same time-stamp as the VoIP stream (i.e. sync to the “audio render clock”).

Minus_Kelvin · November 1, 2017, 6:11pm

One more thing, since this thread keeps getting necro’d

We’re checking in a native VOIP rework to get it to work correctly and so it uses the new synth component. Should make it to main next week.

We also checked in a local “mic-capture” component in a new “audio capture” plugin (and moved the existing sequencer recorder mic capture code to the new plugin).

https://github.com/EpicGames/UnrealEngine/tree/master/Engine/Plugins/Runtime/AudioCapture/Source/AudioCapture/Public