Let's talk physically-based audio

SaviorNT · March 14, 2015, 1:11am

With audio being so important to a good VR experience, I figured there should be a discussion on this very topic. I’ve done some research on the subject, however, I am by no means an audio engineer nor physicist. Here is some info that I’ve gathered that will need processed from within the engine:

Reverb/Echo: Bouncing of the soundwave
Diffraction: How the sound passes through openings in the geometry
Refraction: “Dimming” the sound as it passes through solid geometry

Now, these three phenomena are highly dependent on the material. So, PBR materials will definitely play apart in this entire thing. What I propose, is to include the necessary variables/properties of the material, so that the audio processing engine can then use these variables to calculate the above acoustic phenomena. A good website that lists a whole lot of these known properties is located here.. This list was compiled by the Onda corporation, so props to them. They have listings for solids, liquids, gases, rubbers, and plastics; to name a few.

This entire thing looks pretty familiar, or it should. This is what GI does. So along with baking lightmaps, we should now bake, umm, audiomaps (?).

There will be some cases for different audio projectors as well. As with lights, we would need both an omni sound source, as well as a directional sound source. The directional source does transmit the sound in an omni fashion, however, it would have an… FOV? setting, which by default should be ~ 180 degrees with the falloff of another 180 degrees. Use case scenarios of each type of sound transmiter:

Omni-directional sound: An explosion
Directional sound: Speech

So, I guess I’ll end it here and open the floor to you guys. What are your thoughts / ideas? How can we implement this? What is the overhead for something of this caliber? I know that audio is usually processed by the CPU, and with what I’ve seen with today’s games, the CPU is hardly used at all, so we should be good on that front.

Lindsay_Aries · March 14, 2015, 3:23am

A lot of these things generally get baked into .wavs and then linked to events during a process called Frame 1 animations, so it’s less important that there be some very heavy audio physics processing that recognizes cement as cement and bounces sound off of it correctly. It’s more the event is that you enter an alley and an NPC states “what brings you here?” their voice doesn’t echo much because cement doesn’t carry sound like brass does and the small space you are in has terrible acoustics. But your composer knew this so when he processed the effects on that VA’s track he engineered it to sound right and during that audio call it meshes correctly with the surroundings. Now every time that NPC says that line in that alley it will sound right because her audio is essentially baked to the animation of her shifting her body weight to her hip and moving her lips.

Likewise, most engines don’t have in-engine audio effects processing and so you HAVE to bake the audio to something and you CAN do it to an event like -play this sound when someone gets hurt- but if it’s like a woman screaming and a man gets hurt you sort of shoot yourself in the foot (no pun…ok pun intended). So instead I’ve in the past used a program called fmod to do what I mentioned above and baked a .wav into animations so that when a woman was hurt she screamed. When her animation played, so did her sound. It never played at the wrong time because it couldn’t. The downside being if and when your animators changed the animations (and they did) you would have to rebake the sounds and reimport them.

So it would be the same thing with your situation, or at least I think it would be if I’m understanding correctly. Because essentially you can say I have this sound effect and it’s the wind blowing through a tree and so I’m going to bake it into an animation of a tree sort of bending in the wind. And then any time there’s any tree that uses this animation it will use this sound. Then you can get creative and say “but if there are 400 trees only play the sound for 5”. This keeps everything really cheap for audio demand as well. I view games as a budget for the system to spend and spending your budget on audio when it could be spent on something like framerate or rendering seems…wasteful? But that could be extreme bias.

I can’t tell if this is helpful, but this is just the experience I have had with working with engines and audio processing. Perhaps it is helpful to you here. If not, I apologize for how painfully long this post was.

SaviorNT · March 14, 2015, 2:51pm

This is a good workaround, however, it is impossible to play every circumstance of the way a sound would, well, sound like. In the case of your example, a woman getting stabbed and is screaming.

Let’s say the room & hallway is set up like this:

==========|
Plyr —>…|
=========…|
||…|
|…|…|
|…Vx…|
|…|…|
||…|

The plyr is obviously the player, the V is the enemy, the x is the woman screaming; the dots represent carpet (actually, the forums hate ascii art), the | is a vertical wall, and the = is a thick, horizontal wall, the ~ is a thin horizontal wall. At this point, the sound is 1) being muffled by the walls passing through, and also there is reverb/reflection coming from the room, bouncing off the hallway to where the player is. As the player walks down the hallway, hangs a right, and then enters the room, the sound will change. If the player is say, stealthed, walks behind the enemy, the sound of the women screaming changes even still. Not too mention if there was say, a bookcase, or a couch, or anything else in the room that the player/enemy/women would be, or could be, behind / around.

Could an audio engineer equate for all of these sounds, with proper planning, sure. However, you’re looking at gigs of nothing but audio files to account for every possible audio wave that could happen or be heard, not too mention the sound que events. Let’s say a door opens elsewhere in the area, well, now you also have to account for that door audio. So you would literally be putting an Audio file, for every sound, in every location in the level (or close to).

Of course, you would only need to create the sound files for stuff that is audible to the average person (a person might not hear the change of a sound going through a 2.5" wall vs a 3" wall for example… but still, that is alot of sound files that would need processed. Having a physically-based audio system shouldn’t be that hard to implement and wouldn’t necessarily tax the user’s system as most of the processing done by games are done on the graphics card. At the most, I think I have a game that uses 5% of my CPU power at the most… and I’m not running a super-high end CPU either (i7-950).

One thing to consider as well is that this entire system would be used to accurately represent the world (I’m a VR guy)… since while in VR, you can only bring 2 of your senses into the game world… sight and hearing (sure, you have some people doing touch and smell, but this won’t be consumer level / non-hackery-ish until we get neural interfaces)… both of these need to be as accurate as possible for the game type. Sound, much more so, especially for, say, tracking. With very good positional sound, I can track an enemy/opponent with just audio (close your eyes and listen)… and as a non-related topic, this is also a useful skill for survival; take a knee, relax, close your eyes, and listen to your surroundings

Skykila · March 14, 2015, 3:43pm

Look at E.A.R. Evaluation of Acoustics using Ray-tracing GitHub - aothms/ear: Evaluation of Acoustics using Ray-tracing E.A.R. Evaluation of Acoustics using Ray-tracing

anonymous_user_efd86e431 · March 14, 2015, 4:28pm

Just as easy and demanding as unbiased raytracing and GI in the graphics world - real time is not possible on current hardware. To make it into games you would have to do heavy optimization (read: extreme simplification of real world behaviour), like simple 1st order reflection only for each sound source combined with a simple generic reverb algorithm based on either delays or short convolution.

SaviorNT · March 14, 2015, 5:50pm

Don’t get me wrong… I don’t think we’ll come to the point of “real” GI, lighting or audio, any time soon. However, the goal should be to be able to being as realistic as possible, within the limits of current technology.

anonymous_user_efd86e431 · March 14, 2015, 6:42pm

I absolutely agree.

We can get very far by faking most of the behaviours, but unfortunately it requires a lot of tweaking/tuning to get great results. In the audio pro segment a ‘simple’ reverb is 3-7 years design project and it will still fake most of above mentioned phenomena for real time performance. Even offline renders (www.odeon.dk) is far from able to emulate real environments.

But if we could use 10% of a i7-2600 for simulation then it’s big step in the right direction.

Lindsay_Aries · March 14, 2015, 9:47pm

Oh goodness, I think it would be great, you just mentioned you didn’t do a lot of audio engineering so I figured I would offer insight into how it’s currently being handled. It isn’t super elegant but that’s essentially how it’s done.

And yeah so even still in your situation (and mine) I could bake it. I could bake the whole thing because never in that demonstration would I not know A) where the player was, B) what the player’s current animation state was and C) what the room layout was. It may require hundreds of files and I could probably reuse a few because humans aren’t super attuned to a lot of audio nuances, but yeah, I can bake those effects as long as they are happening in relation to my player because I know everything about them at all times.

That being said, having a situation that can just dynamically adapt for me especially in regards to to pin point sound sources would be pretty incredible for VR (I KNEW YOU WERE A VR GUY). If I say “hey they sound is coming from 10m to my right but now move the player along a simple vector to the left at 2km/h” then the sound should slowly fade into the distance around me. I could bake it, but it would be pretty incredible for the experience to happen dynamically. I know it’s incredible when graphics render dynamically, so this is likely the same feeling.

DIFTOW · March 14, 2015, 9:53pm

This is called Sound Propagation. It has been possible for years by baking/pre-computing.

I started a thread about this as well in another area of the forums.

There have been many different approaches to this, some better than others.
Parametric wave field coding for precomputed sound propagation (supplemental video) - YouTube - This one was done in UDK.

This one must be old, because its using Source engine.Wave-Ray Coupling for Interactive Sound Propagation in Large Complex Scenes - YouTube
I hate how dismissive some people get about improving audio or downplaying the importance of audio.
I think we need to all band together and form an open-source community project to get this going.

Baking audio effects is great only for a linear game. The minute that audio can be played back in different locations, the effects will no longer work.

Reverb zones only handle reverb, and they are not that great either at transitioning.

Diffraction and Refraction as mentioned above though, would be either impossible or a waste of time to try and fake with trigger zones, outside of a very linear or small situation.
It is a much better use of time, and better result to implement middleware/plugin that will pre-compute these things.

If it could be demonstrated 10 years ago on Source engine, I’m pretty sure it can be done now.
Remember EAX? It’s dead because modern CPUs became fast enough to process those effects.

EDIT: Using something like OpenAL will also make the rendering cheaper.

anonymous_user_f3e4cfa6 · March 17, 2015, 9:14pm

As another audio and VR guy I also think that true 3D sound - as well as physics - will be even more important when a user is truly encompassed in a virtual environment. It will no longer be good enough to simply hear an animal ‘somewhere to my right’ as stereo or surround sound would render it. The sound needs to be realistically matched to the proper azimuth and elevation (say 30* right and 50* down).

Hopefully we can get to a point soon where we can nail the direct sound with the proper HRTF profile; then do a few of the first early reflections; take a stab at the late reflections with more and more general approximations; and finish with a room-appropriate reverb tail.

At least one can dream…