Good References Monitors

This work is licensed under a Creative Commons License.

Good References

Jun 1, 2001 12:00 PM, Brian Knave
Electronic Musician

Judging by the steady flow of letters and phone calls we get asking our advice about what gear to buy, a good number of readers are well acquainted with cognitive overload. That's the term psychologists use to describe the paralysis that can set in when we are confronted by too many options (or too much information). Freedom of choice is great, but clearly, too many options can bewilder. Case in point: the EM 2001 Personal Studio Buyer's Guide lists 40 companies presently offering reference monitors, with more than 200 models to choose from.

Bewildered? If so, you've come to the right place. This article will cover the various designs, components, and properties (including terminology) of reference monitors, as well as how they work — in short, all you need to know to make informed decisions when selecting close-field reference monitors for your personal studio. (Though many of the concepts discussed here apply equally well to monitors for surround arrays, those interested specifically in monitoring for 5.1 should also see “You're Surrounded” in the October 2000 EM.)

PRE ROLL
Speakers used in recording studios are called monitors and generally fall into two categories: main monitors and compact or close-field reference monitors. Mains, as they are called, are mostly found in the control rooms of large commercial studios, often flush-mounted in a “false” wall (called a soffit); close-field reference monitors are freestanding and usually sit atop the console bridge or on stands directly behind the console.

Most personal studios don't have the space or funds for main monitors, so this article will focus on the compact reference monitor — a relatively recent studio tool. The first “compact” monitor to see widespread use in recording studios was the JBL 4311, a 3-way design introduced in the late 1960s. The 4311 was quite large, however (it had a 12-inch woofer, a 5-inch midrange speaker, and a 1.4-inch tweeter), and today would qualify more as a mid-field monitor.

As engineers increasingly realized the importance of hearing how their mixes sounded on car and television speakers, smaller reference monitors gained in popularity. One of the earliest favorites (around the mid-1970s) was the Auratone “cube,” which had a single 5-inch speaker.

Car and home-stereo speakers kept improving, of course, so engineers were always on the lookout for better close-fields. One compact model that caught on big was the Yamaha NS-10M (see Fig. 1). A bookshelf-type speaker introduced in 1978 for home use, the NS-10M soon became a familiar sight in commercial studios, and it remains popular — or at least ubiquitous — to this day.

Another significant development was the introduction in 1977 of the MDM-4 near-field monitor, made by audio pioneer Ed Long's company, Calibration Standard Instruments. The MDM-4s were great monitors, but it was the then-revolutionary concept of near-field monitoring that secured a chapter in audio history for Long. (Long also originated the concept of time alignment for speakers and trademarked the term “Time Align”; more on this later.) Though no one could have predicted how prophetic the term near-field monitor would prove, Long clearly understood its significance and so had it trademarked. (That is why EM uses the term close-field monitor instead).

ENVIRONMENTAL ISSUES
Curiously, because close-field reference monitors have become increasingly accurate during the course of time, the original rationale for using them — to generate a good indication of how mixes will translate to low-cost car and home-stereo speakers — has waned. But there are also other good reasons close-field monitors have become all but indispensable in music production. For one, professional mix engineers are typically hired on a project-by-project basis, which means they may end up in a different studio from one day to the next. Close-field monitors, because they are portable enough to be carted from studio to studio, make for an ideal solution and guarantee, at the minimum, some level of sonic consistency, regardless of the room.

But don't the monitors sound different in different rooms? To a degree, they do. But another advantage of close-field monitors is that they can partially mitigate the effect of the room on what you hear. As their name makes clear, they are meant to be used in the “near field,” typically about three feet from the engineer's ears. At that distance, assuming the monitors are well positioned and used correctly, the sound can pass to the ears largely unaffected by surface reflections (from the walls, ceiling, console, and so forth) and the various sonic ills they can wreak.

For the same reason, close-field monitoring is also a good solution for the personal studio, where sonic anomalies are the norm. As engineer, consultant, and all-around acoustics wizard Bob Hodas has so well demonstrated, however, it's foolhardy to think close-field monitors entirely spare you from the effects of room acoustics. “near-field monitors can be accurate,” explains Hodas, “only if care is taken in the placement of the speakers and room issues are not ignored.” (Find more information at www.bobhodas.com/pub1.html.)

DIFFERENT WORLDS
A common misconception among those new to music production is that home-stereo speakers are adequate for monitoring. That is, in fact, not the case. The problem is one of purpose: whereas manufacturers design reference monitors to reproduce signals accurately, home-stereo speakers are specifically designed to make recordings sound “better.” Typically, that perceived improvement is accomplished by boosting low and high frequencies. Although it may sound like an enhancement to the average listener, such “hype” is really a move away from accuracy.

Home-stereo speakers may also be engineered to de-emphasize midrange frequencies so as to mask problems in this critical range. That makes it difficult to hear what's going on in the midrange, which can tempt mixers to overcompensate with EQ. It can also lead to fatigue because the ear must strain to hear the mids.

Yet another reason home-stereo speakers are inappropriate for monitoring is that they are meant to be listened to in the far field, where much of the sound is reflected. But as we've seen, close-field monitors are designed to be used in the near field, in order to help minimize the effects of room acoustics. Of course, it's important not to sit too close to near fields. Rather, they should be positioned far enough back to allow the sound from the speakers to blend into an apparent point source and stereo soundstage. As you move in closer than three feet or so, the sound from each speaker becomes distinguishable separately, which is not what you want.

ELUSIVE BULL'S-EYE
Everyone can agree that reference monitors are meant to reproduce signals accurately. But what is accuracy? For our purposes, there are three objective tests that can be performed to help quantify accuracy in reference monitors. The tests measure frequency response, transient or impulse response, and lastly, distortion.

Frequency response is a measure of the changes in output level that occur as a monitor is fed a full spectrum of constant-level input frequencies. The output levels can be plotted as a line on a graph — called a frequency response plot — in relation to a nominal level represented as a median line typically marked 0 dB (see Fig. 2). The monitor is said to have a “flat” or linear frequency response when that line corresponds closely to the median line — that is, does not fluctuate much above or below from one frequency to the next.

When they are written out, frequency-response specifications first designate a frequency range, which is typically somewhere between 40 and 60 Hz on the low end and 18 to 22 kHz on the high end. To complete the specification, the frequency range is followed by a range specifier, which is a plus/minus figure indicating, in decibels, the range of output fluctuation. For example, the spec “50 Hz — 20 kHz (±1 dB)” means that frequencies produced by the monitor between 50 Hz and 20 kHz will vary no more than 1 dB up or down (louder or quieter) from the input signal. (That spec would suggest a very flat monitor, by the way!) Note that the range specifier may also be expressed as two numbers, for example “+1/-2 dB,” which is useful when the response varies more one direction than the other.

Primary frequency-response measurements are made on-axis, that is, with the test mic directly facing the monitor, often at a distance of one meter. Also helpful are off-axis frequency response plots (measured with the mic at a 30-degree angle to the monitor, for example), which give an indication of how accurate the response will be — or how much it might change — as you reach for controls or gear located outside of the “sweet spot.” (The sweet spot is the ideal position to sit at in relation to the monitors; it is calculated by distance, angle, and listening.)

Transient or impulse response is a measure of the speaker's ability to reproduce the fast rise of a transient and the time it takes for the speaker to settle or stop moving after reproduction of the transient. Obviously, the first characteristic is critical to accurate reproduction of instrument dynamics and transients (such as the attack of a drum hit or a string pluck). The second is important because a speaker that is still in motion from a previous waveform will mask the following waveform and thus muddle the sound (see Fig. 3).

Distortion refers to undesirable components of a signal, which is to say, anything added to the signal that was not there in the first place. For monitors it can be divided into two categories: harmonic distortion and intermodulation distortion (IM). Harmonic distortion is any distortion related in some way to the original input signal. It includes second- and third-harmonic distortion, total harmonic distortion (THD), and noise (which are the types most commonly measured; see Fig. 4), as well as higher harmonic distortions (fifth, seventh, ninth, and so on). Intermodulation distortion is a form of “self-noise” that is generated by the speaker system in response to being excited by a dynamic, multifrequency signal; typically, it is more audible and more annoying than harmonic distortion.

Frequency response, impulse response, and distortion levels should all be taken into account to get an idea of a monitor's accuracy. However, frequency response is often the only measure mentioned in product literature and reviews, and even it gets short shrift on occasion. (In many instances, I have seen frequency specs given with no range specifier — and of course, without it the specification is meaningless). Few manufacturers provide an impulse response graph (even assuming they have measured impulse response), and often the only distortion specification given is “THD + noise.” In fact, the lack of established and agreed-upon standards for monitor (and for microphone) specifications — for both measuring them and reporting them — is a long-standing industry issue. Though it is true that specs don't tell the entire story, they are useful for corroborating what our ears tell us, and as such they can help educate us so that we can more exactingly listen.
MIRROR IMAGE

Now that we've established the raison d'être of the close-field monitor, let's take a look at its anatomy. We'll start with the internal components and work our way outward to the enclosure. Understanding how monitors are put together will help you know what to look for when deciding which best suit your needs.

Interestingly, the devices on either end of the recording signal chain — microphones and monitors — are very similar. Both are types of transducers, or devices that transform energy from one form into another. The difference is in the direction of energy flow: microphones convert sound waves into electrical signals and speakers convert electrical signals into sound waves. However, the components and operating principles of monitors and mics are essentially the same.

The speakers most commonly used in close-field monitors work in the same way as moving-coil dynamic microphones do, only in reverse. (Actually, there is a correlative speaker for other types of microphones as well, including ribbons and condensers. However, we will limit the discussion to the moving-coil type in this article.) In a moving-coil dynamic microphone, a thin, circular diaphragm is attached to a fine coil of wire positioned inside a gap in a permanent magnet. Sound waves move the diaphragm back and forth, causing the attached coil to move in its north/south magnetic field, thus generating a tiny electric current within the coil of wire.

In a loudspeaker, the coil of wire is known as the voice coil. As the electric current (audio signal) fluctuates in the wire, it generates an oscillating magnetic field that pushes and pulls against the magnet, causing the voice coil and attached diaphragm (in this case, the speaker cone; see Fig. 5) to vibrate. In turn, the vibrating speaker cone agitates nearby air molecules, creating the sound waves that reach our ears. (The ear, by the way, is also a transducer. It has a diaphragm — the timpanic membrane or eardrum — that converts acoustic sound waves into tiny electrochemical impulses which the brain then interprets as sound.)

DRIVING LESSONS
A loudspeaker's magnet, voice coil, and diaphragm form, collectively, an assembly called a driver. (The moving-coil driver is the most common type, but there are other kinds as well.) Close-field monitors usually contain either two or three drivers, and thus are designated 2-way or 3-way, respectively. Standard 2-way monitors contain a woofer and tweeter; standard 3-ways contain a woofer, a tweeter, and a midrange driver. The woofer, of course, reproduces lower frequencies and the tweeter, the higher frequencies.

Cones and domes are the two most common types of diaphragms used in monitor drivers. Woofers and most midrange drivers employ cone diaphragms, typically made of treated paper, polypropylene, or more exotic materials such as Kevlar. (Note that the dome-shaped piece in the center of a woofer cone is a dust cap, not a dome.) Most moving-coil tweeters use a small dome, typically measuring one inch in diameter. One advantage of a small dome is that it exhibits fast transient response and a wide dispersion pattern, both of which are critical to the reproduction of upper frequencies. Domes are routinely made of treated paper too, but may also be made from a metal such as aluminum or titanium, or sometimes from stiffened silk, which some people believe sounds less harsh than metal.

When monitors employ separate drivers, as 2-way and 3-way monitors do, the design is termed discrete. In discrete designs, the drivers are usually mounted on the front face of the enclosure as close together as possible, which helps the sound blend into a coherent point source at the sweet spot. Depending on the monitors, the sound can change dramatically as you move away from the sweet spot.

IT'S ABOUT TIME
Some companies, for example Tannoy, employ an alternative driver design in some of their monitors in which the tweeter is mounted in the center of the woofer cone (see Fig. 6). Though more expensive, this coaxial design is naturally more time coherent than discrete designs because the drivers are positioned on the same axis (as well as closer together). Indeed, the coaxial driver arrangement is one of the design elements (among others) that manufacturers have used to meet Ed Long's Time Align specification, mentioned before.

Before we can understand how time alignment can improve a monitor's accuracy, we must first understand the timing problems inherent in conventional monitor designs. Discrete loudspeakers cause minute delays that spread sounds out in time, resulting in lost detail and a blurred or smeared sound. Specifically, sound from the woofer is delayed more than sound from the tweeter. This problem has two main sources, one structural, the other electronic. In a discrete monitor with a flat-face enclosure, the woofer voice coil is naturally set back further than the tweeter voice coil because of the extra depth of the cone in relation to the dome. The tweeter is therefore closer to your ears, causing the high frequencies to arrive slightly ahead of the lows.

The problem is compounded by the crossover, an electronic circuit that splits the incoming signal into separate frequency bands and directs each band to the appropriate driver (more on crossovers momentarily). As it happens, crossovers also tend to delay low frequencies more than highs.

With his Time Align scheme, Long was the first to specify corrections for these problems, including physically lining up the drivers and adjusting driver and crossover delay parameters. When correctly implemented, Time Alignment ensures that the time relationships of the fundamentals and overtones of sounds are the same when they reach the listener as they were in the electrical signal at the input terminals of the monitor.

Over the years, some manufacturers have devised their own time-alignment schemes. You may recall, for example, the now-discontinued JBL 4200 series monitors, which employed protruding woofers designed to deliver low frequencies to the listener's ears simultaneous with highs from the tweeters.

WHEN I CROSS OVER
As mentioned, the crossover's job is to divide the incoming signal into separate bands and then send each band to the appropriate driver. In inexpensive monitors, this is typically accomplished using simple lowpass and highpass filters that split the signal coming from the power amp. This is called a passive crossover. In more sophisticated systems, an active crossover splits the line-level signal before it gets to the power amp. This requires each driver to have its own power amp, and is called biamping in 2-way monitor, triamping in a 3-way, and so on.

Typically, monitors that have active crossovers incorporate internal power amps. These are called powered monitors. The terms active and powered, though often used interchangeably, actually refer to different things: active refers to the crossover, and powered to the fact that the amplifiers are part of the package. In other words, although active monitors are almost always powered, not all powered monitors are active. For example, Event Electronics at one time offered three versions of its popular 20/20 monitors: the straight 20/20 was unpowered and had a passive crossover; the 20/20p was powered but used a passive crossover; and the 20/20bas (biamplified system) was both powered and active.

In addition to giving a more exacting crossover performance, powered, active monitors offer other advantages over passive designs. Perhaps most importantly, because the amps and electronics are specifically designed to match the drivers and enclosure, powered monitors eliminate the guesswork and the potential pitfalls of matching an external amp to your monitors. (For a discussion of matching power amps to passive monitors, see the sidebar “A Good Match.”) This means reduced risk of blowing the drivers and virtually no risk of overtaxing the amps. In addition, the internal wiring is much shorter, which cuts down on frequency loss, noise induction, and other gremlins attributable to long cable runs. The upshot is that a power, active system provides a more reliable reference — no matter where you take the monitors, you can be sure the only variable is room acoustics.

BOX SET
The enclosure is a critical part of any reference monitor design. Compact monitors present a particular challenge to designers because diminutive enclosures do not support low frequencies well. For many small monitors, the lowest practical frequency is around 60 Hz. However, certain techniques allow manufacturers to extend the low-frequency response of their boxes.

A common solution is to vent or port the enclosure (see Fig. 6). The concept of porting is quite complex, involving not only one or two visible holes, but also other acoustic-design constructions inside the cabinet. In this design, often termed a bass reflex system, the port helps “tune” the enclosure to resonate at frequencies lower than the woofer's natural rolloff. That is, as the frequencies drop below the monitor's lowest practical note, the enclosure begins to resonate at yet lower frequencies, essentially providing a bass “boost.” Although porting can extend the low-frequency response of the monitor well below a similarly sized but completely sealed enclosure (called an infinite baffle or acoustic suspension design), some people feel that the resulting bass extension is not a trustworthy reflection of what is really going on in the low end. (One noteworthy solution here is the incorporation of a subwoofer.)

Ports tend to be round, ovular, or slit-shaped, and usually are located on either the front or rear panel of compact monitors. Rear ports allow for a smaller front face, and therefore a more compact monitor, but they can also lead to sonic imbalances — the main one being excessive bass — in cases where the monitor is mounted too close to a wall or corner. Front ports help avoid this problem, but require a larger front face on the enclosure.

Another problem with front ports is that they can reduce the structural integrity of the front baffle (which is already weakened by at least two large holes, one each for the woofer and tweeter). Some ported monitors provide port plugs, which can be helpful for reducing low-frequency output in case you are forced to mount the monitor near a wall or corner. (A different solution for this problem is increasingly found in powered/active monitors — “contour” switches that let you adjust the monitor's low- and high-frequency output to compensate for acoustical imbalances in the listening space.)

Nowadays, most manufacturers build their enclosures from medium-density fiberboard (MDF), a material that offers better consistency and lower cost than wood. Grille cloths may or may not be provided with the monitors; but these are a cosmetic enhancement at best, and traditionally are removed for monitoring.

Because an enclosure's front baffle shapes the sound as it leaves the drivers, all aspects of the baffle must be taken into account by the designers. For this reason, designers often round off corners and sharp edges, and the face of the enclosure is kept as smooth and spare as possible in order to minimize interferences like diffraction (breaking up of sound waves). One critical acoustic-design feature on the front baffle is the wave guide — a shallow, contoured “cup” surrounding the tweeter. The structure and the shape of the wave guide both affect high-frequency dispersion, which in turn affects other sound qualities such as imaging (see Fig. 7).

PERFORMANCE ISSUES
Now that we've laid the groundwork, let's tally up what constitutes a superior monitor. Specifically, what do you hear in better monitors that you don't hear in lower-quality ones?

We already know one answer: accuracy. More than anything, the purpose and goal of a reference monitor is to transduce signals accurately. Monitoring is the last step in a long journey through the various processes required to get your music to its destination. Therefore reference monitors are your ultimate “feedback” system and the basis of all of the decisions you make about how to shape and process a mix.

As we've seen, the technical recipe for accuracy has three basic ingredients: accurate frequency response, accurate impulse response, and low distortion. Superior monitors boast a very flat frequency response, typically within ±3 dB of a nominal level. In addition, the frequency response should roll off smoothly at either end of the spectrum, as well as fall off evenly as you move away or off axis from the monitor.

Also critical is a monitor's impulse response. Ideally, this should be a direct analog to changes in air pressure in response to transient electrical signals; a superior monitor keeps all the “time domain” qualities of a signal intact, reproducing them in exactly the same time relation as they appear at the monitor's input terminals. In addition, in a superior monitor the frequencies issuing from discrete drivers are time aligned so as to compensate for the time misalignment inherent in discrete designs, as described earlier. That way, the highs, mids, and lows reach the listener's ear simultaneously.

Both impulse response and time alignment (among other things) figure prominently into two other critical sonic qualities of a reference monitor: soundstage and imaging. Soundstage refers to the imaginary stage that forms between two speakers (including width and depth), and imaging refers to how well the monitors can localize individual instruments on the soundstage. Obviously, a good soundstage and precise imaging are necessary for accurate positioning of instruments within the stereo field.

Distortion levels vary considerably from system to system. Whereas home-stereo speakers typically exhibit as much as 1 percent distortion above bass frequencies, some high-quality reference monitors may deliver as little as 0.1 percent. Though a low distortion spec is always desirable, some monitors with less-than-spectacular distortion specs still excel thanks to superiority by other measures. The human ear, however, is very sensitive to distortion, especially in the midrange (distortion is often a major contributor to ear fatigue).

Another helpful specification is speaker sensitivity or efficiency, which shows the monitor's output sound pressure level (in dB SPL) at a distance of 1 meter with an input signal of 1W. All things being equal (which they rarely are), speaker sensitivity has no determining effect on sound quality. However, if you are doing an A/B comparison of two or more sets of passive monitors and running them from the same power amp through a switching box, it is important to be aware of differing sensitivities. Our ears can readily perceive even slight differences in SPL, and our brains naturally perceive louder sources as sounding better. If you fail to compensate for any sensitivity differences — that is, to ensure that each monitor is playing back at the same level — you are more prone to reach incorrect assessments of monitors while comparing them.

FAITHFUL TRANSLATOR
Accuracy is important because, ostensibly at least, it guarantees that what we hear from our monitors is the “audio truth.” Unfortunately, though, objective measures don't really guarantee accuracy. As helpful as specs may be, they are not really an indicator of how a monitor sounds; two similar monitors with near-identical specs can sound very different, for example. Therefore, as in all things audio, careful listening must be the final measure. After all, monitoring is inherently subjective.

But even if monitoring weren't subjective and reliable standards for accuracy could be decided on and agreed upon, the problem of wide-ranging sonic differences among playback systems would still persist. More important than accuracy is knowing how your mixes will translate to other speakers in other environments. That's the real bottom line. And the only way to gain that certainty is from experience. As they say, practice makes perfect — and it's no different with reference monitors than with musical instruments. After all, a monitor is a musical instrument of sorts. Thus the need to spend many hours, many days, many months working with a set of monitors, “practicing” on them, listening to your results on countless playback systems, always fine tuning, adjusting, figuring out what the quirks are, where the bumps and holes are, and how every little thing translates, until you reach a level of familiarity that allows you to work undaunted, confident that the mix you dial in will bear a strong resemblance to what the end-user ultimately hears. Regardless of what monitors you use, until you are intimately familiar with them, mixing will remain something of a guessing game.

This point was brought home to me recently as I chatted with ace mix engineer Chris Lord-Alge. With multiple platinum credits to his name, Lord-Alge certainly qualifies as an “expert” on the subject of monitoring, at least in the sense that he knows what it takes to turn out mixes that sound great across the board, from boom box to high-end audiophile system. And just as surely, Lord-Alge has attained success enough to acquire and use any monitor he wants. So what monitors does he use? The latest, greatest, most expensive ones available? Not at all. Rather, Lord-Alge uses the same monitors he has mixed on for most of his career: a pair of Yamaha NS-10Ms. “The key thing with any monitors,” explains Lord-Alge, “is that you get used to them. That's ultimately what makes them work for you. And 25 years on NS-10s hasn't led me wrong yet.”

CAN OF WORMS
This brings us to a can of worms I'd just as soon not open — but open it we must if we're to inquire seriously into the nature of reference monitoring. Anyone who has searched for the “perfect” monitor has run smack into this dilemma, which is best summed up by these questions: Who, ultimately, are you mixing for? The snooty audiophile with speakers that cost more than most folks' cars? Or the masses who listen to music on cheap systems?

Lord-Alge's answer is enlightening: “Ninety-five percent of people listen to music in their car or on a cheap home stereo; 5 percent may have better systems; and maybe 1 percent have a $20,000 stereo. So if it doesn't sound good on something small, what's the point? You can mix in front of these huge, beautiful, pristine, $10,000 powered monitors all you want. But no one else has those monitors, so you're more likely to end up with a translation problem.”

Similarly, I learned a few years ago that John Leventhal, who was one of my heroes at the time, did the bulk of his mixing on a pair of small Radio Shack speakers. (Leventhal, a New York City-based guitarist, songwriter, and engineer, made his mark by producing Shawn Colvin's acclaimed 1989 record, Steady On.) Leventhal owns both a pair of Yamaha NS-10Ms and a pair of Radio Shack Optimus 7s. But he prefers the latter.

Top