AMD CPU very low performance because IXAudio2SourceVoice::DestroyVoice

Hi,

Current implementation of FXAudio2SoundSource::FreeResources( void ) works really bad with AMD CPUs, because of this call

Source->DestroyVoice();

Which freezes completely main thread.

For example measuring simple sound, with this piece of code:

const uint32 BroadcastBeginTime = FPlatformTime::Cycles();
Source->DestroyVoice();
				
const uint32 BroadcastEndTime = FPlatformTime::Cycles();
UE_LOG(LogXAudio2, Warning, TEXT("11FreeResources: %d "), (BroadcastEndTime - BroadcastBeginTime));

For example:

Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (8 CPUs), ~3.3GHz takes ~50 cycles to finish DestroyVoice

vs

AMD procesor FX-8320, X8, socket AM3 , 64bit, 3,5GHz, ~300 cycles to finish DestroyVoice.

In very optimistic cases the main thread can freeze for >50 ms. Imagine game full of sounds on AMD :slight_smile:

To avoid title thread interruptions
from a blocking DestroyVoice call, the
application can destroy voices on a
separate non-critical thread, or the
application can use voice pooling
strategies to reuse voices rather than
destroying them. Note that voices can
only be reused with audio that has the
same data format and the same number
of channels the voice was created
with.

As you see from the official documentation, DestroyVoice blocks the whole thread.

Regards

Pierdek

I confirm having poor performance too, even freeze, on the exact same part of code.
When the UE4 editor window loses focus, when I try to get back in, the function Source->DestroyVoice() often takes more than 30 seconds to finish. This result of the IDE being non resumable, and I need to restart it.
I’m using an AMD 8120 @ 3.1Ghz.
Not sure what is really causing that.

Hi everyone,

Sorry for the delay in responding to this post. I have managed to get my hands on a machine that has an AMD CPU. Unfortunately, it does not have Visual Studio or UE4 on it, so I am in the process of getting those items installed in order to take a look at this.

Tim

I had a chance to run this test today, and noticed that the cycle times do appear to be higher on the AMD machine than what I was seeing on my own machine. However, the AMD machine that I was testing on is not quite comparable in terms of capabilities so I am going to see if I can get a co-worker to run the same test on his machine which has an AMD processor and is more comparable to my own.

Tim

Hi everyone,

Thank you for reporting this issue to us. While testing this issue, we were not able to see anything conclusive. When I tested on my Intel processor, I was seeing cycle times between 41 and 51 (with an occasional spike). When my co-worker tested on his AMD processor, he saw cycle times that were actually very slightly lower than mine on average. The computer I had used initially was actually a laptop with significantly lower specs than the computers we ended up testing on, and with that the cycle times averaged between 55 and 75 (with occasional spikes that were a little more frequent than with the desktop computers).

I did get in touch with the audio engineer for the Engine, and he mentioned that he did not have any additional insight into this issue. He mentioned that this isn’t something we would be able to fix since the audio engine is currently running in the main thread, and moving it out, or using voice pools, would be a major project. We do have a long-term plan to re-write the audio engine, and this is most likely something that will be taken into consideration whenever that happens.

Tim

Thanks for your time on this, but as you did not reproduce it, it seems it is related to something else than simply an AMD processor. Maybe the sound card or its drivers are incriminated ?
I’m using an ASUS M5A99X EVO with it integrated soundcard, with latest drivers which are from 2012, on win 7 64 bits. Pierdek what’s yours ?

@Tim: average cycle times aren’t problem here, the lag spikes are the problem, because the DestroyVoice is freezeing main thread - the whole game is destabilized. I’ve implemented simple pool IXAudio2Voice and the problem dissapear. I don’t want to make a pull request on GIT, my code is HACK on HACK :slight_smile: I just want to let you know about the problem.

@CodingMarmot: Nope, definitely this is not problem in the drivers, I have gathered some data from the users:

As you see they have different operating systems, different audio devices and drivers, the common denominator is AMD CPU.

Is there any possibility that you could run your test again using the ShooterGame sample project? That project has a great deal more sounds than in my original test project. I saw quite a few more spikes in ShooterGame, but no noticeable lag when playing. Most of the log entries were in the 50-70 range, but there were some spikes that went up to 4 or 5 digits.

Tim

I;m little busy with fixing regression isues after merge 4.8 :wink: I’ll try get repro with simple sample around september.

That’s not a problem. I’ll mark this as resolved for now, and whenever you have a chance to do some more looking into this issue just add a new comment and it will re-open the post. I appreciate your help with this.

Tim

Hey guys, we just had another licensee reporting this exact issue so I went ahead and came up with a solution that is pretty simple that I think should avoid the hitching. I’d like to ask you guys to grab the CL and see if it works for you guys.

Here’s my github CL on the master branch:
https://github.com/EpicGames/UnrealEngine/commit/9fff1159f8c758e0d587b6833215950a9c3a8b7f

I’d prefer not to use pools since I think our XAudio2 sources are not guaranteed to all be the same source format (e.g. num channels, sample rate, etc) which means we’d have to juggle different sets of pools for different formats and because I think that’d constitute a bit more of an invasive change, I’d prefer not to do that if we can avoid it. We’ve already recently leveraged the UE4 task API for async on-the-fly ogg decoding so there’s already precedence in the audio engine for using tasks for little jobs like this.

It seems the performance with your change is similiar to my pools, but it crashes out of blue:)

I see this may helps me:
https://github.com/EpicGames/UnrealEngine/commit/5a01ef76fa41f1c425a0f2748f01814d216c972f

This is still a very big issue. Many AMD users are reporting huge fps drops in Squad making the game unplayable. Even setting audio channels lower does not solve the problem.

My rig: AMD 8350 - 16gb RAM - GTX 760sc - Win10
I’m getting the lower end of 20-35fps with very frequent micro-stutters.

Updgrade your engine to 4.10, or merge manually changes around voice pools, here is github commit where You should start:
https://github.com/EpicGames/UnrealEngine/commit/c85bc88870f907447b2548ac6ae98e822f0b1fb0

Engine is upgraded to 4.10. Users with AMD cpu’s are getting horrible performance irregardless of having high end gpu.

Ok, We are testing 4.10 and I can confirm what Clinch said, voice pooling in Windows is still broken.

Hi guys,

The original post here was specifically about the IXAudio2Source->DestroyVoice call being slow on AMD.

The tech i implemented for voice pools is to specifically solve that problem. Voices are no longer being created/destroyed for every sound. DestroyVoice is only called when the audio device shuts down in the ~FXAudioDeviceProperties() destructor:

~FXAudioDeviceProperties()
{
	// Destroy all the xaudio2 voices allocated in our pools
	for (int32 i = 0; i < VoicePool.Num(); ++i)
	{
		for (int32 j = 0; j < VoicePool[i]->FreeVoices.Num(); ++j)
		{
			IXAudio2SourceVoice* Voice = VoicePool[i]->FreeVoices[j];
			Voice->DestroyVoice();
		}
	}

}

I did not realize this was posted by the same people having issues we’ve heard about with Squad and VOIP streaming. I suspect the performance issue is related to VOIP/Procedural buffers. I’ve seen some videos and the audio issues only occur when somebody is using VOIP. Otherwise, the audio sounds like its working fine but most of the videos I’ve seen have about 90% of the video with people using VOIP so it’s hard to tell.

Have you guys done any special code with VOIP and procedural voices? There were a few people on UDN/Answerhub that were doing some experimental stuff with VOIP streams – like supporting 3d spatialization and VOIP (which I believe we don’t officially support) and if I recall there was some threading/critical section code that looked like it may be non-performant. Indeed, this might make sense in conjunction with your original post about DestroyVoice. DestroyVoice will block the main thread until the voice processing thread finishes, which if there’s blocking code in the procedural voice callback functions, would cause DestroyVoice to be very slow.

Can you guys confirm that its indeed VOIP stuff causing the issues by playing your game without VOIP on AMD and seeing if audio is stuttering?

As far as we can tell. (and I will let a coder chime in here)

There are two issues we are seeing.

  1. General low performance on AMD compared to similar spec Nvidia / Intel systems (as you can imagine results vary wildly depending on the specific config)

  2. Very specific massive frame lag / stutter only when voip is active and only on AMD systems.

We did a bunch of specific profiling and may have narrowed #2 down to EQ being used in a specific filter. (this filter seems to exclusively cause the lag in testing) where the normal voip channels do not. (In fact some people have simply deleted the uasset containing the EQ enabled filter and the problem has completely disappeared)

Any leads on why EQ might cause massive frame spikes or stuttering?

We will report back any success with EQ adjustments.

No, as I said, our EQ filter is just the basic XAudio2 built-in EQ effect. There’s pretty much nothing we can do to make it more performant than it is since it’s black-box.

Here’s the built-in xaudio2 effects:

Here’s the params:

And if you want to look at how we set it up, check out XAudio2Effects.cpp. Basically, it creates a pre-master submix voice with the EQEffectChain and routes audio to it if enabled.

The issue I found was in:

FXAudio2EffectsManager::SetEQEffectParameters

Where there’s hard-coded params for the 4th band – i.e. the 4th band can never have anything but a fc of 10k. My “fix” is to just to expose all the params of the EQ effect. I haven’t checked it in yet because we’re doing testing/stabilization from our internal dev branch. I’ll send you a CL once I check that fix in.

As I said, my suspicion is AMD audio devices are doing something that other devices aren’t and is requiring more CPU time per audio block.

Their AMD TrueAudio feature page suggests as much:
http://www.amd.com/en-us/innovations/software-technologies/trueaudio

It’s likely that the EQ effect is just the “feather” on the back of the CPU camel that is pushing your stuff over the edge and causing buffer underruns.

Have you guys written custom audio code to get VOIP 3D-spatialization? Are there any thread synchronization issues between the main thread in your VOIP procedural voice (either in audio capture side or rendering)? If so, I’d suggest removing any critical sections or other blocking synchronization mechanisms and use a lockless queue or some other lockless mechanism to get any main-thread data (e.g. position information) to any of your custom code.