Audio files download size hardly make sense

LionUnchained · March 26, 2024, 7:32pm

Just to expand on this :
(and please let me know if this all checks out)

There is a key difference between audio file types is whether they are lossless or compressed.
WAV is a lossless file type, and what that means in application here is that the only control we have over the file size comes through the sample rate as the bit rate is constant.

For Lossless WAV files:

# Channels referring to mono or stereo (1 or 2)
Bit Rate = (Bit Depth) x (Sample Rate) x (Channels).

Since you know they’re going to convert this, then in this equation the only variable we have control over is the Sample Rate.

Standard sample rates are 48.0k and 44.1k in professional audio. The following example is a 44.1k .wav sample converted to a 22k .wav and a 22k .ogg file respectively.

The following image is the memory calculation in the map of the resulting files all uploaded. As you can see the memory gain is minimal and this has to do with the fact that Lossless files are not variable bit rate like lossy file formats such as .ogg or .mp3. You always have a minimum file size in this regard.

Since we must take into consideration that Epic has this audio import and compression process in place this instructs us on the optimal audio management approach. What may not be obvious but is very important is that you stand only to lose quality by ever converting your audio to a lossy format prior to this point in the process. What this means in practice is always try to get .wav files to begin, and try not to source audio files from .mp3s/.oggs/etc.

Based on Lama’s findings and some additional testing and my understanding of audio I came to the conclusion that the reason that he’s ending up with larger files has to do with the following (this is my best theory):

Digital Distortion: Digital audio is trying to recreate analog signals which are in the form of waves, when we compress and audio signal digitally we create artifacts and clipping, squaring off the digital signature of an originally rounder wave. (This creates digital distortion, think of a bit crusher effect). Because we’re doing this process back and forth we end up compressing (losing audio fidelity and dynamic range) and then expanding in a destructive way.
Artifacts = Data: Anywhere there is an audio signal written (not silence) would increase the file size (which is different than video compression where more black pixels = less total colors written = less memory usage, so degradation of the file would almost always reduce the file size).
You’re essentially crunching down your wave forms (think of a bit crusher effect) and then expanding them and then crunching them down again.
Because lossless has a minimum file size at a given sample rate you can theoretically both lose fidelity and gain file size in that way.
Encoding: Different lossy compression algorithms and settings (bit rates, sample rates, etc.) have varied efficiencies and ways of handling audio data. This variability can lead to situations where converting between formats and bit rates results in files that are larger than expected, without any improvement in quality and in fact, with a cumulative quality loss.

We can see this with Lama’s example and if you convert your own file from .wav to .ogg and then back to .wav you can experience this yourself.

What this leaves us with the option to reduce the sample rate which I would argue isn’t worth the cost when comparing the end result and measuring (Audio Quality : Memory Cost) as the value ratio:

retest8k is an 8k 16-bit .wav

There’s no other way to know the difference besides playing the audio file, and if you play these side by side you can be the judge of whether or not the quality loss is viable and set your sample rate to taste.

Personally I would look for memory savings elsewhere and stick to 44.1k 16-bit .wav files that have never been converted to a lossy quality.