Audio Encoding
How big the disk space to allot to your precious audio?
How small is still OK for the web distribution?
All these are generally considered the questions of personal perceptions. However, as we may have some material which we would possibly want to share we really should weigh what other people have to say of the quality of different audio encoding approaches.
Not having done any serious research in the subject, I’ll just post yet another opinion. First of all, please note, my main concern in the field of audio coding is transparency. That basically means that if I find a particular encoding to feel of inferior quality in a given application I deem it unusable and recommend against it.
I presume you already have your own estimation of the connection between different bit-rates and the sound quality. So if you happen to find my estimations a lower standard than yours - then be it! - use what you think is more transparent.
I basically have two main settings for encoding audio:
- High quality ("archival grade")
- Medium quality (for "cheap distribution" and portable audio)
Each of them may be down-categorized into
- mono/stereo
- speech/music
For mono it is usually OK to set the bit-rate to half of what you would normally use for stereo, except "live rec" cases. The thing is that in many situations live stereo is encoded adequately in joint-stereo mode which makes use of redundancy between channels. So in reality for mono to sound equally good as stereo (without the "space" lost of course) it is needed slightly more than half the bit-rate for stereo.
Generally I don’t count distinction between speech and music is a good idea encoding-wise. That’s because the nasty artifacts of the lower bit-rates are equally annoying regardless of what you are listening to - live concert or spoken word. It’s distracting and it’s bad. However, as speech usually features long silent intervals in the audio stream which may be encoded with fewer bits and also because the noise-like sounds (i.e. non-sine-like which require more bits) are as a rule quieter and rarer in speech programs I believe it is acceptable to save some expensive space in a portable device by setting the "VBR quality" option slightly lower and lowering the overall bit-rate range to some degree than you would normally use for music.
Note. VBR quality refers to the strategy of a variable bit-rate codec which defines how willing the codec is to lower (or raise) the bit rate in a given frame for the audio complexity the codec thinks this frame has. The lower the quality setting, the more willing the codec is to back down the rate of bits in each given frame.
One final note about speech coding. If you’ve got stereo speech recording do not down-mix it to mono! It is tempting to save space by this little obvious trick (but not much compared to joint-stereo!) but will render the recording much less pleasant to listen with the headphones. Mono recordings heard through HP are really fatiguing (at least to some)
Now for high quality encodings I used to use MP3 VBR all the time (LAME encoder). I just set the VBR quality to the max ("0") and set the range of bit-rates 160 kbps - 320 kbps. This way all the music is encoded with 320 kbps except those "simple" passages when the bit-rate is dropped a little bit and saves space. So basically I could say "my recordings are in maximum MP3 quality possible" and that wouldn’t be far from truth. In fact, some really dense loud music gets almost all its frames encoded in the maximum 320 kbps this way, resulting in a whopping 260 and even higher VBR bit-rates (i.e. practically no saves compared to CBR 320 kbps)
But as storage prices continue to drop, I’m already questioning the feasibility of MP3 use as an archival encoder. Just take a look at these points:
- MP3 is considered the worst codec for future recompression.
You may need to recompress to lower bit-rates in the future for your web-site for e.g. - MP3 encoding is very slow, especially in high quality VBR modes.
As it turns out, lossless codecs perform much faster than MP3 Lame. - High quality MP3 doesn’t save you that much space compared to an average lossless codec.
Cf. 250 kbps VBR MP3 to 750 kbps lossless (Monkey or FLAC) to 1411 original PCM
Bottom line: 1 DVD+R ($.60 as of the early 2006) will hold 10 (!) average audio CD’s in a lossless codec with no sound quality compromise. That’s $0.06 per CD back-up. Do you really need anything cheaper? That’s the question…
Well, the medium quality encoding is where MP3 shines indeed.
- You will undoubtedly always appreciate the threefold (or even higher) increase in your portable player capacity minutes-wise;
- same goes for internet charges for uploading recordings to your web-site as well as host bandwidth.
- Now let me remind you an apparent drawback of MP3 - the ultimate unsuitability of MP3 for subsequent recompressions and editing. Just as it is largely inappropriate for an archive it is a virtue for content sharing. This "feature" effectively locks your material into "listening" only. Any modification will likely render it unprofessionally sounding hence worthless. Yet the rights for unprocessed sound are more easily claimed.
As for actual "medium quality" settings, I use the following:
| stereo | mono | |
| music | 48-160kbps VBR, J-stereo, VBR quality 4 | 48-128kbps VBR, Mono, VBR quality 4 |
| speech | 40-128kbps VBR, J-stereo, VBR quality 5 | 32-96kbps VBR, Mono, VBR quality 4 |
Notes:
- These bit-rates are only a guideline. It is no harm to tweak them a little according to the situation (for e.g.: more background noise - allow more bits etc)
- VBR quality range is 0(highest) to 9(lowest)
- Sampling rate is better left unchanged except when you want the widest possible hardware compatibility for your recordings which implies the rate of 44.1 kHz, 16 bits per sample.
That’s about it.



