Book Read Free

How Music Got Free: The End of an Industry, the Turn of the Century, and the Patient Zero of Piracy

Page 2

by Stephen Witt


  Brandenburg thought this goal was preposterous—it was like trying to build a car on a budget of two hundred dollars. But he also thought it was a worthy target for his own ambitions. He worked on the problem for the next three years, until in early 1986 he spotted an avenue of inquiry that had never been explored. Dubbing this insight “analysis by synthesis,” he spent the next few sleepless weeks writing a set of mathematical instructions for how those precious bits could be assigned.

  He began by chopping the audio up. With a “sampler,” he divided the incoming sound into fractional slivers of a second. With a “filter bank,” he then further sorted the audio into different frequency partitions. (The filter bank worked on sound the way a prism worked on light.) The result was a grid of time and frequency, consisting of microscopic snippets of sound, sorted into narrow bands of pitch—the audio version of pixels.

  Brandenburg then told the computer how to simplify these audio “pixels” using four of Zwicker’s psychoacoustic tricks:

  First, Zwicker had shown that human hearing was best at a certain range of pitch frequencies, roughly corresponding to the tonal range of the human voice. At registers beyond that, hearing degraded, particularly as you went higher on the scale. That meant you could assign fewer bits to the extreme ends of the spectrum.

  Second, Zwicker had shown that tones that were close in pitch tended to cancel each other out. In particular, lower tones overrode higher ones, so if you were digitizing music with overlapping instrumentation—say a violin and a cello at the same time—you could assign fewer bits to the violin.

  Third, Zwicker had shown that the auditory system canceled out noise following a loud click. So if you were digitizing music with, say, a cymbal crash every few measures, you could assign fewer bits to the first few milliseconds following the beat.

  Fourth—and this is where it gets weird—Zwicker had shown that the auditory system also canceled out noise prior to a loud click. This was because it took a few milliseconds for the ear to actually process what it was sensing, and this processing could be disrupted by a sudden onrush of louder noise. So, going back to the cymbal crash, you could also assign fewer bits to the first few milliseconds before the beat.

  Relying on decades of empirical auditory research, Brandenburg told the bits where to go. But this was just the first step. Brandenburg’s real achievement was figuring out that you could run this process iteratively. In other words, you could take the output of his bit-assignment algorithm, feed it back into the algorithm, and run it again. And you could do this as many times as you wished, each time reducing the number of bits you were spending, making the audio file as small as you liked. There was degradation of course: like a copy of a copy or a fourth-generation cassette dub, with each successive pass of the algorithm, audio quality got worse. In fact, if you ran the process a million times, you’d end up with nothing more than a single bit. But if you struck the right balance, it would be possible to both compress the audio and preserve fidelity, using only those bits you knew the human ear could actually hear.

  Of course, not all musical work employed such complex instrumentation. A violin concerto might have all sorts of psychoacoustic redundancies; a violin solo would not. Without cymbal crashes, or an overlapping cello, or high register information to be simplified, there was just a pure tone and nowhere to hide. What Brandenburg could do here, though, was dump the output bits from his compression method into a second, completely different one.

  Termed “Huffman coding,” this approach had been developed by the pioneering computer scientist David Huffman at MIT in the 1950s. Working at the dawn of the Information Age, Huffman had observed that if you wanted to save on bits, you had to look for patterns, because patterns, by definition, repeated. Which meant that rather than assigning bits to the pattern every time it occurred, you just had to do it once, then refer back to those bits as needed. And from the perspective of information theory, that was all a violin solo was: a vibrating string, cutting predictable, repetitive patterns of sound in the air.

  The two methods complemented each other perfectly: Brandenburg’s algorithm for complicated, overlapping noise; Huffman’s for pure, simple tones. The combined result united decades of research into acoustic physics and human anatomy with basic principles of information theory and complex higher math. By the middle of 1986, Brandenburg had even written a rudimentary computer program that provided a working demonstration of this approach. It was the signature achievement of his career: a proven method for capturing audio data that could stick to even the stingiest budget for bits. He was 31 years old.

  He received his first patent before he’d even defended his thesis. For a graduate student, Brandenburg was unusually interested in the dynamic potential of the marketplace. With a mind like his, a tenure-track position was guaranteed, but academia held little interest for him. As a child he’d read biographies of the great inventors, and at an early age had internalized the importance of the hands-on approach. Brandenburg—like Bell, like Edison—was an inventor first.

  These ambitions were encouraged. After escaping from Zwicker, Dieter Seitzer had spent most of his own career at IBM, accruing basic patents and developing keen commercial instincts. He directed his graduate students to do likewise. When he saw the progress that Brandenburg was making in psychoacoustic research, he pushed him away from the university and toward the nearby Fraunhofer Institute for Integrated Circuits, the newly founded Bavarian technology incubator that Seitzer oversaw.

  The institute was a division of the Fraunhofer Society, a massive state-run research organization with dozens of campuses across the country—Germany’s answer to Bell Labs. Fraunhofer allocated taxpayer money toward promising research across a wide variety of academic disciplines, and, as the research matured, brokered commercial relationships with large consumer industrial firms. For a stake in the future revenues of Brandenburg’s ideas, Fraunhofer offered state-of-the-art supercomputers, high-end acoustic equipment, professional intellectual property expertise, and skilled engineering manpower.

  The last was critical. Brandenburg’s method was complex, and required several computationally demanding mathematical operations to be conducted simultaneously. 1980s computing technology was barely up to the task, and algorithmic efficiency was key. Brandenburg needed a virtuoso, a caffeine-addled superstar who could translate graduate-level mathematical concepts into flawless computer code. At Fraunhofer he found his man: a 26-year-old computer programmer by the name of Bernhard Grill.

  Grill was shorter than Brandenburg and his manner was far more calm. His face was broad and friendly and he wore his sandy hair a little long. He spoke more loudly than Brandenburg, with more passion, and conversations with him were composed and natural. He told jokes, too, jokes that were—well, not all that funny either, but certainly better than Brandenburg’s.

  In the world of audio, Grill stood out, for it was possible to imagine him as something other than an engineer. Like Brandenburg, he was Bavarian, but his attitude was more bohemian. He had a relaxed, wonkish nature to him, and was the sort of person who, had he lived in America, might have favored sandals and a Hawaiian shirt. Perhaps it was his background. While Brandenburg’s father was himself a professor, and most of the other Fraunhofer researchers hailed from the upper middle class, Grill’s father had worked in a factory. For Brandenburg, a university education had been a given, practically a birthright, but for Grill it had real meaning.

  In his own way he had rebelled against the typisch Deutsch mentality. His original passion had been music. At a young age Grill had taken up the trumpet, and by his teens he was practicing six hours a day. During a brief period in his early 20s he had played professionally in a nine-piece swing band. When the economic realities of that career choice became apparent, he’d returned to engineering, and ended up studying computers. But music remained close to his heart, and over the years he amassed an enormous, eclectic collection of recorded music from a variety of obscure genres. His oth
er hobby was building loudspeakers.

  Brandenburg and Grill were joined by four other Fraunhofer researchers. Heinz Gerhäuser oversaw the institute’s audio research group; Harald Popp was a hardware specialist; Ernst Eberlein was a signal processing expert; Jürgen Herre was another graduate student whose mathematical prowess rivaled Brandenburg’s own. In later years this group would refer to themselves as “the original six.”

  Beginning in 1987, they took on the full-time task of creating commercial products based on Brandenburg’s patent. The group saw two potential avenues for development. First, Brandenburg’s compression algorithm could be used to “stream” music—that is, send it directly to the user from a central server, as Seitzer had envisioned. Alternatively, Brandenburg’s compression algorithm could be used to “store” music—that is, create replayable music files that the user would keep on a personal computer. Either way, size mattered, and getting the compression ratio to 12 to 1 was the key.

  It was slow going. Computing was still emerging from its homebrew origins, and the team built most of its equipment by hand. The lab was a sea of cables, speakers, signal processors, CD players, woofers, and converters. Brandenburg’s algorithm had to be coded directly onto programmable chips, a process that could take days. Once a chip was created, the team would use it to compress a ten-second sample from a compact disc, then compare it with the original to see if they could hear the difference. When they could—which, in the early days, was almost always—they refined the algorithm and tried again.

  They started at the top, with the piccolo, then worked down the scale. Grill, who had obsessed over acoustics since childhood, could see at once that the compression technology was far from being marketable. Brandenburg’s algorithm generated a variety of unpredictable errors, and at times it was all Grill could do to take inventory. Sometimes, the encoding was “muddy,” as if the music were being played underwater. Sometimes it “hissed,” like static from an AM radio. Sometimes there was “double-speak,” as if the same recording had been overlaid twice. Worst of all was “pre-echo,” a peculiar phenomenon where ghostly remnants of musical phrases popped up several milliseconds early.

  Brandenburg’s math was elegant, even beautiful, but it couldn’t fully account for the messy reality of perception. To truly model human hearing, they needed human test subjects. And these subjects required training to understand the vocabulary of failure as well as Grill did. And once this expertise was established, it would have to be submitted to thousands upon thousands of controlled, randomized, double-blind trials.

  Grill approached this time-consuming endeavor with enthusiasm. He was what they called a “golden ear”: he could distinguish between microtones and pick up on frequencies normally available only to children and dogs. He approached the sense of hearing the way a perfumer approached the sense of smell, and this sharpened sense allowed him to name and grade certain sensory phenomena—certain aspects of reality, really—that others could never know.

  Charged with selecting the reference material, Grill combed his massive compact disc archive for every conceivable form of music: funk, jazz, rock, R&B, metal, classical—every genre except rap, which he disliked. He wanted to throw everything he could find at Brandenburg’s algorithm, to be sure it could handle every conceivable case. Funded by Fraunhofer’s generous research budget, Grill went beyond music to become a collector of exotic noise. He found recordings of fast talkers with difficult accents. He found recordings of birdcalls and crowd noise. He found recordings of clacking castanets and mistuned harpsichords. His personal favorite came from a visit to Boeing headquarters in Seattle, where, in the gift shop, he found a collection of audio samples from roaring jet engines.

  Under Grill’s direction, Fraunhofer also purchased several pairs of thousand-dollar Stax headphones. Made in Japan, these “electrostatic earspeakers” were the size of bricks and required their own dedicated amplifiers. They were impractical and expensive, but Grill considered the Stax to be the finest piece of equipment in the history of audio. They revealed every imperfection with grating clarity, and the ability to isolate these digital glitches spurred a cycle of continuous improvement.

  Like a shrinking ray, the compression algorithm could target different output sizes. At half size, the files sounded decent. At quarter size, they sounded OK. In March 1988, Brandenburg isolated a recording of a piano solo, then dialed the encoding ratio as low as he dared—all the way down to Seitzer’s crazy stretch goal of one-twelfth CD size. The resulting encoding was lousy with errors. Brandenburg would later say the pianist sounded “drunk.” But even so, this experiment in uneasy listening gave him confidence, and he began to see for the first time how Seitzer’s vision might be achieved.

  Increases in processing power spurred progress. Within a year Brandenburg’s algorithm was handling a wide variety of recorded music. The team hit a milestone with the 1812 Overture, then another with Tracy Chapman, then another with a track by Gloria Estefan (Grill was on a Latin kick). In late 1988, the team made its first sale, and shipped a hand-built decoder to the first ever end user of mp3 technology: a tiny radio station run by missionaries on the remote Micronesian island of Saipan.

  But one audio source was proving intractable: what Grill, with his imperfect command of English, called “the lonely voice.” (He meant “lone.”) Human speech could not, in isolation, be psychoacoustically masked. Nor could you use Huffman’s pattern recognition approach—the essence of speech was its dynamic nature, its plosives and sibilants and glottal stops. Brandenburg’s shrinking algorithm could handle symphonies, guitar solos, cannons, even “Oye Mi Canto,” but it still couldn’t handle a newscast.

  Stuck, Brandenburg isolated samples of “lonely” voices. The first was a recording of a difficult German dialect that had plagued audio engineers for years. The second was a snippet of Suzanne Vega singing the opening bars of “Tom’s Diner,” her 1987 radio hit. Perhaps you remember the a cappella intro to “Tom’s Diner.” It goes like this:

  Dut dut duh dut

  Dut dut duh dut

  Dut dut duh dut

  Dut dut duh dut

  Vega had a beautiful voice, but on the early stereo encodings it sounded as if there were rats scratching at the tape.

  In 1989, Brandenburg defended his thesis and was awarded his PhD. He then took the voice samples with him on a fellowship to AT&T’s Bell Labs in Murray Hill, New Jersey. There, he worked with James Johnston, a specialist in voice encoding. Johnston was the Newton to Brandenburg’s Leibniz—independently, he had hit upon an identical mathematical approach to psychoacoustic modeling, at almost exactly the same time. After an initial period spent marking territory, the two decided to cooperate. Throughout 1989, listening tests continued in parallel in Erlangen and Murray Hill, but the American test subjects proved less patient than the Germans. After listening to the same rat-eaten, four-second sample of “Tom’s Diner” several hundred times, the volunteers at Bell Labs revolted, and Brandenburg was forced to finish the experiment on his own. He was there in New Jersey, listening to Suzanne Vega, when the Berlin Wall came down.

  Johnston was impressed by Brandenburg. He’d spent his life around academic researchers and was accustomed to brilliance, but he’d never seen anybody work so hard. Their collaboration spurred several breakthroughs, and soon the scratching rats were banished. In early 1990, Brandenburg returned to Germany with a nearly finished product in hand. Many compressed samples now revealed a state of perfect “transparency”: even to a discriminating listener like Grill, using the best equipment, they were indistinguishable from the original compact discs.

  Impressed, AT&T officially graced the technology with its imprimatur and a modicum of corporate funding. Thomson, a French consumer electronics concern, also began to provide money and technical support. Both firms were seeking an edge in psychoacoustics, as this long-ignored academic discipline was suddenly white hot. Research teams from Europe, Japan, and the United States had been working on the same
problem, and other large corporations were jockeying for position. Many had thrown their weight behind Fraunhofer’s better-established competitors. Seeking to mediate, the Moving Picture Experts Group (MPEG)—the standards committee that even today decides which technology makes it to the consumer marketplace—convened a contest in Stockholm in June 1990 to conduct formalized listening tests for the competing methods.

  As the ’90s opened, MPEG was preparing for a decade of disruption, shaping technological standards for near-future technologies like high-definition television and the digital video disc. Being moving picture experts, the committee had first focused exclusively on video quality. Audio encoding problems were an afterthought, one they’d tackled only after Brandenburg pointed out that there was no longer much of a market for silent movies. (This was the sort of joke that Brandenburg liked to make.)

  An MPEG endorsement might mean a fortune in licensing fees, but Brandenburg knew it would be tough to get. The Stockholm contest was to be graded against ten audio benchmarks: an Ornette Coleman solo, the Tracy Chapman song “Fast Car,” a trumpet solo, a glockenspiel, a recording of fireworks, two separate bass solos, a ten-second castanet sample, a snippet of a newscast, and a recording of Suzanne Vega performing “Tom’s Diner.” (The last was suggested by Fraunhofer.) The judges were neutral participants, selected from a group of Swedish graduate students. And, as MPEG needed undamaged ears that could still hear high-pitched frequencies, the evaluators skewed young.

  Fourteen different groups submitted entries to the MPEG trials—the high-stakes version of a middle school science fair. On the eve of the contest, the competing groups conducted informal demonstrations. Brandenburg was confident his group would win. He felt that access to Zwicker’s seminal research, still untranslated from German, gave him an insurmountable edge.

 

‹ Prev