the skinny on bit depth and sampling rates

Warning!  You're entering seriously geeky territory here.  The following may leave you confused as I'm skimming over something like 2 to 4 semesters worth of university courses in a relatively concise article.  If you're looking for a comprehensive article on the subject, this isn't it (however, this just may be, and be sure not to miss the digital show and tell video!). That said, if recorded music is your game, these are the ground rules.

If you're looking for the short answer to which sample rate and bit depth I recommend, here it is: pick a sample rate between 44.1 and 96 kHz (personally I like 44.1 or 88.2, unless your final destination is a Blu-Ray, in which case go with 96), and stick with 24-bit, utilizing proper gain-staging and headroom all the way through.

If you want to know why I make those recommendations, read on.

Let's start with a brief discussion about sampling rates.  We know from the work of Harry Nyquist and Claude Shannon that we can accurately represent a band-limited signal by sampling at twice the highest frequency we wish to capture.  What does this mean in real world terms?  Well, if we start with the premise that humans can hear up to about 20 kHz (which quite frankly is very generous to all but the youngest and healthiest ears), and we want to leave a little buffer room for a low-pass filter to remove everything above that frequency (that's the band-limiting part), we end up somewhere around 22.05 kHz.  Double that, and we get 44.1 kHz, the all-familiar CD sampling rate.

I'll say that again: the 44.1 kHz CD sampling rate can accurately represent even the highest frequency humans can ever hope to hear with no loss of fidelity.  This is demonstrably true.  Don't believe me?  Check out the linked video above.

So at this point you may be wondering why anyone is concerned with higher sample rates, and rightly so.  From a playback standpoint, there is no evidence that listeners, trained or otherwise, can differentiate between "hi-res" files and files at the CD standard under even the most generous conditions.  However, when it comes to processing recorded sound it turns out there are at least a few good reasons to use higher sampling rates.  Without getting too technical, any non-linear processors including compressors, limiters, saturation effects, etc. are liable to create ultrasonic harmonic components which can exceed the Nyquist frequency (half the sampling frequency).  If not properly accounted and coded for, these ultrasonic components can cause aliasing distortion, effectively being mirrored down into the audible band.  While these days most plugins that are susceptible to this behavior oversample with appropriate anti-aliasing and anti-imaging filters internally, working at a sample rate between 48 and 96 kHz can provide an extra safety net to help avoid these issues.  Another instance in which high sample rates are useful is when making high frequency adjustments with a digital EQ.  Imagine you make a boost at 16 kHz with  a broad-Q bell filter.  Part of the frequency and phase response of that filter will extend beyond the cutoff frequency of the 44.1 kHz sampling rate, however from a DSP design standpoint it is necessary to force the phase response back to zero at Nyquist.  You can read more about this in Vlad Goncharov's excellent article on the subject.  While there are ways to code around this, the simplest way to obtain the most natural phase response is to increase the sampling rate.

Okay, so if 48, 88.2, and 96 kHz sampling rates make sense for processing purposes, what's wrong with 176.4 and 192 kHz sampling rates?  Wouldn't they extend the benefits mentioned above even further?  Not quite, unfortunately.  While it is true that there is ultrasonic content present in 88.2 and 96 (and to a lesser extent 48) kHz files , it pales in comparison to the amount of ultrasonic content potentially present in a 176.4 or 192 kHz file.  In an ideal playback system, ultrasonic content is innocuous enough, after all we can't hear it.  However, in most real world playback systems there are significant nonlinearities beyond the audible range in which the equipment was designed to work.  The end result is intermodulation distortion which manifests as harmonically unrelated artifacts distributed randomly throughout the audible band.  Listening tests have revealed that these artifacts are indeed noticeable and thus may actually degrade audio fidelity.

Fine.  A little extra digital bandwidth can be good, but too much is usually bad.  What about bit depth?  Luckily this is a bit simpler.

You may have some sense that the number of bits we use to quantize a sampled signal determines the number of "levels" available at which to store that value.  Intuitively, it follows that more bits equals more "levels", which must necessarily equate to better "resolution".  While this is true, let's put things in perspective.  In a 16-bit digital "word" there are 65,536 potential "levels" (or values) that a sample can be stored at.  If that were a stack of standard printer paper, it would stand 20.5 feet tall.  Any rounding errors will be less than 1/65,536th off from where they should be, and it is this figure that corresponds to the -96 dBFS noise floor of 16-bit audio (dB = 20*log(V1/V2), in case you were wondering).  What this should suggest to you, at least if you're unfortunate enough to think as I do, is that bit depth has more to do with the noise floor than it does with any sort of "resolution" (typically interpreted as "grainy-ness").  In fact, you will find that the only thing gained by storing a file at 24-bit, as compared to 16-bit, is a 48 dB lower noise floor.

Well surely, that must be a good thing, right?  Yes.  Just, yes.  No ifs, ands, or buts (except for the fact that thermal noise in modern electronics is often at best about 120 dB below its analog clipping point, equivalent to about 20 bits, but so what).  However, let's frame that -144 dBFS noise floor another way.  Rather than just imagine it as 16-bit with a 48 dB lower noise floor, let's view it as 16-bit with the noise floor lowered by 24 dB (the thermal noise of the electronics), which in turn yields an extra 24 dB of extra headroom beyond what the clipping point of 16-bit would have been.  This means we can set our recording levels to average and peak around -24 and -6 dBFS respectively, and still be doing 18 dB better than full-scale 16-bit, probably with more dynamic range to boot!  If this doesn't convince you to lower the level of your recordings to avoid any potential for clipping, I don't know what will.  This is also the reason why sending a 24-bit final mix with average levels around -20 dBFS and peaks in the -8 to -4 dBFS range isn't remotely problematic; the noise floor of the electronics is still 100 dB below your average level.  It is important to note that in the strictest sense there is not actually any additional headroom (Bob Katz likes to refer to this as footroom, a term I'm inclined to agree is more appropriate).  Exceeding 0dBFS will still cause harsh digital clipping, so don't do it!

Okay, so if 24-bit files provide the lowest practical noise floor while simultaneously leaving plenty of headroom, what's all this I hear about 32-bit files?  And what the heck is floating point?  Well, remember how I said this all was a bit simpler than the sample rate mess?  I may have lied a little.  While 24-bit files are ideal for recording, storage, and playback, things can become problematic when you start editing, mixing, and processing them.  While I'm deliberately smoothing over some of the finer details, the gist of the fundamental issues are as follows: add too much gain to the file and you will run into digital clipping; decrease the gain by too much and you begin losing low level detail.  Enter 32-bit floating point processing and files.  The idea here is that you keep your original 24 bits, and add another 8 bits which serve as an exponent, or multiplier, effectively allowing you to move your original 24 bits up and down in level without clipping or loss of low level detail.  This is why 32-bit floating point can be a helpful format to render to if you find that your master buss is clipping after you remove the compressor and limiter you've had on it the whole time you've been mixing.

Alright, remember how I said I may have lied a little?  I actually lied quite a bit.  In reality, this is one of the most complex parts of digital mixing as there are benefits and limitations to both fixed and floating point summing engines.  At the end of the day though, most DAWs do not allow you to choose how they sum, but rather use the implementation they believe is best, whether that be double-precision fixed-point, single-precision floating-point, or more and more, double-precision floating-point.  As recording and mixing engineers, our best bet is to utilize proper gain-staging to keep our levels in the sweet spot of the 24-bit format, and trust that when forced to move away from that sweet spot, the summing engine in our DAW of choice will handle things as gracefully as possible.  Realistically we're talking about extremely subtle details here, and no modern DAW could get away with a summing engine that noticeably degraded the sound in more than a hand-full of all possible mix scenarios.

So don't worry too much.  Set your converter's clock to one of the options between 44.1 and 96 kHz.  Record at 24-bit and leave some decent headroom.  Most importantly, go out there and focus on making some great music.  After all, a mediocre recording of a great song will always be more enjoyable than a great recording of a mediocre song.