Dna: The Secret of Life

Page 21

by Watson, James

This kind of speculative gene patenting creates a terrible drag on medical research and development, leading in the long run to fewer and poorer treatment options. The trouble is that the speculators are in effect patenting potential drug targets – the proteins upon which any drug or treatment yet to be invented might act. For most big pharmaceutical firms gene patents on drug targets, filed by biotech companies with little or no biological information on function, become a poison pill. The large royalties demanded by gene-finding monopolies tip the economic balance against drug development; cloning a drug target is at most 1 percent of the way to an approved drug. Furthermore, if a company produces a drug with a particular target for which it also holds the patent on the underlying gene, that company has no immediate incentive to develop better drugs for that target. Why invest in R&D when your patent makes it prohibitively costly – if not simply illegal – for other companies to get in on the act?

The prospect of the TIGR/HGS/SmithKline Beecham triumvirate having a commercial stranglehold on human gene sequences alarmed both the academic and commercial molecular biology communities. In 1994 Merck, one of SmithKline Beecham's traditional rivals in the pharmaceutical business, provided the genome center at Washington University with $10 million to sequence human cDNAs and publish them openly, thus delivering an open-access riposte to HGS.

At about the time that TIGR and HGS were taking their first steps to commercialize the genome, Francis Collins was appointed to succeed me as the director of the NIH's genome effort. Collins was an excellent choice. He had proven himself a top-notch gene mapper, with several major disease genes under his belt, including the ones for cystic fibrosis, neurofibromatosis (the so-called Elephant Man disease), and, as part of a multipronged effort, Huntington disease. Had prizes been awarded in the early matches of the HGP tournament – those contests for the mapping and characterization of important genes – the palm would surely have gone to Collins. He did himself keep score after his own fashion: a Honda Nighthawk motorcycle being his preferred mode of transport, his colleagues added a decal to his helmet every time a new gene was mapped in his lab.

Collins was raised in Virginia's Shenandoah Valley on a ninety-five-acre farm without plumbing. Initially home-schooled by his parents, a drama professor and a playwright, he wrote and directed his own stage production of The Wizard of Oz at age seven. The wicked witch of science, however, dragged Collins away from a career in the theater; after completing a Ph.D. in physical chemistry at Yale, he went to medical school and from there into a research career in medical genetics. Collins is a member of a rare species, the devoutly religious scientist. In college, he recalls, "I was a pretty obnoxious atheist," but that changed in medical school, when "I watched people in terrible medical circumstances who were engaged in battles for survival, which many of them lost. I watched how some people leaned on their faith and saw what strength it gave them." To the Human Genome Project Collins brought scientific excellence as well as a spiritual dimension singularly lacking in his predecessor.

By the mid-nineties, with the initial mapping of the human genome accomplished and sequencing technologies fast developing, it was time to get down to the nitty-gritty of As, Ts, Gs, and Cs – time to start sequencing. Sticking to the game plan outlined at the outset by our NAS committee, we would first attack an array of model organisms: bacteria to start with, and then on to more complicated creatures (with more complicated genomes). The lowly nematode worm, C. elegans, was the first big nonbacterial challenge, and as the joint achievement of John Sulston at Britain's Sanger Centre and Bob Waterston at Washington University it provided an excellent model of international collaboration. The worm's sequence was published in December 1998, all 97 million base pairs of it. No bigger than a comma on this page, and comprising a fixed number of cells, just 959, the worm nevertheless has some 20,000 genes.

At first sight, Sulston appeared ill-suited to a leadership role in Big Science. He had spent most of his professional life staring down a microscope in order to produce in astonishing detail a complete description, cell by cell, of the development of the worm. Bearded and avuncular, he is the son of a Church of England vicar and also a lifelong socialist who believes passionately that business and the human genome should have nothing in common. Like Francis Collins, he is a motorcycle enthusiast; he used to commute on his 550cc machine from his home outside Cambridge to the Sanger Centre until, just as the HGP was gathering speed, an accident left him severely injured and his bike, as he put it, "little more than nuts and bolts." The Wellcome Trust, which was funding the Sanger Centre, was horrified to learn that the project's scientific leader was taking his life in his hands every time he came to work: "After we'd invested all that money in this bloke!" complained Bridget Ogilvie, then the trust's director.

Sulston's U.S. partner, Waterston, was an engineering major at Princeton and imported plenty of engineering savvy to the big sequencing center he ran at Washington University. Waterston has the capacity to extrapolate – to start small and finish big. Accompanying his daughter on a jog, he found he liked running, and is now an accomplished marathon runner. During its first year of operation, his sequencing group produced just forty thousand base pairs of worm sequence, but within a few years it was cranking out enormous amounts, and Waterston was one of the earliest to urge an all-out human sequencing effort (see Plate 42).

But even as those in the international HGP collaboration began to sequence model organisms, gearing up for the big one, the molecular biological equivalent of an earthquake shook the whole enterprise.

Craig Venter and TIGR had been doing well. Having milked his cDNA gene discovery strategy for several years, Venter became interested in sequencing whole genomes. In this, too, he was persuaded of the superiority of his own approach. The HGP had been carefully mapping the location of the different chunks of DNA on the chromosomes before actually sequencing them. That way, you already knew that chunk A was adjacent to chunk B and could look for overlaps between them when it came to knitting together the final sequence. Venter preferred a "whole genome shotgun" (WGS) approach, in which there was no initial mapping: you simply broke the genome up into random chunks, sequenced them all, fed all the sequences into a computer, and relied upon the computer to put them all in the right order on the basis of overlaps, without benefit of any prior positional information. Venter and his team at TIGR showed that this brute force method could indeed work, at least for simple genomes: in 1995 they published the genome sequence of a bacterium, Haemophilus influenzae, using this method.

It remained problematic, however, whether WGS would work for a large and complex genome like the human one. The problem is repeats – segments of the same sequence occurring in different places in the genome – which could in principle scupper a WGS sequencing attempt. These repeats might well mislead even the most sophisticated computer algorithm. If, for instance, a repeat occurs in chunks A and P, the computer could mistakenly situate A next to Q rather than in its proper position, next to B. For its part, the HGP itself had discussed this scenario when it considered using a WGS approach, and, based on careful calculations by Phil Green in Seattle, the consortium concluded that such an effort would likely be confounded by the human genome's massive amount of long-repeating sequences of junk DNA.

In January 1998, Mike Hunkapiller of ABI, maker of automated sequencing machines, invited Venter to check out his newest model, the PRISM 3700. Venter was impressed, but nothing could have prepared him for what was to follow. Hunkapiller suggested that Venter form a new company, funded by ABI's parent company, PerkinElmer, to sequence the human genome. Venter had no misgivings about forsaking TIGR – relations had long since soured with Haseltine at HGS. And so he wasted no time in founding the firm that was later to be called Celera Genomics. The company motto: "Speed matters. Discovery can't wait." The plan: to sequence the entire human genome by WGS using three hundred of Hunkapiller's machines and the single greatest concentration of computing power outside the Pentagon.
The project would take two years and cost between $200 million and $500 million.

The news broke just before the leaders of what would come to be called the public (as opposed to private) Human Genome Project were meeting at Cold Spring Harbor Laboratory. To put it mildly, the news was not well received. The worldwide public project had already spent some $1.9 billion (of public money), and now, as the New York Times was spinning it, we might have nothing to show for the money except for the sequence of the mouse genome, while Venter waltzed off with the holy grail, the human genome. What was especially galling was Venter's flouting of what had come to be known as the Bermuda principles. In 1996, at an HGP conference in Bermuda – a meeting Venter attended – the HGP had agreed that sequence data should be released as soon as it was generated. The genome sequence, we all concurred, should be public property. Now a renegade, Venter had different ideas: he claimed he would defer releasing new sequence data for three months, selling licenses to pharmaceutical companies and any other parties seriously interested in buying a preview.

Fortuitously, the Wellcome Trust's Michael Morgan was able to give the public project a welcome boost just days after Venter's announcement by declaring that it would be doubling its support for the Sanger Centre, bringing the total up to around $350 million. Though the timing of the announcement made this look like a direct response to Venter's challenge, the increase in funding had in fact been in the works for quite some time. Shortly afterwards the U.S. Congress beefed up its own contribution to the public HGP's coffers. The race was on. In fact, from the outset there were always going to be at least two winners. Science only stood to benefit from two human genome sequences, one against which to check the other. (With over 3 billion base pairs involved, there was bound to be a typo or two.) Another winner would surely be ABI: they stood to sell a lot more PRISM sequencing machines, which most labs in the public project would now have to buy to keep up with Venter!

The acrimonious exchanges between the leaders of the private and public projects would become a fixture of newspaper science pages for the next couple of years. The back and forth got to a pitch that moved President Clinton to direct his science adviser, "Fix it . . . make these guys work together." But through it all, the sequencing moved ahead, and Venter did demonstrate that a WGS approach could work on a respectably sized genome when, in collaboration with the fruit fly wing of the public consortium, he announced the completion of an advanced draft of the Drosophila genome early in 2000. It, however, contains relatively little repetitive junk DNA, and Celera's success in assembling it in no way guaranteed WGS would work on the human genome.

No individual was more vital to meeting the Celera challenge than Eric Lander. It was he who envisioned an almost entirely automated sequencing process in which robots would take the place of technicians, and it was he who had the drive to make this vision a reality. Lander's resume indeed shows he knew a thing or two about drive. A Brooklyn boy, he was a curve-busting math whiz at Stuyvesant High in Manhattan who went on to win first prize in the Westinghouse Talent Search; he then became valedictorian of his class at Princeton (78) before earning his Ph.D. at Oxford on a Rhodes fellowship. A MacArthur "genius" award in 1987 seemed almost redundant. His mother, incidentally, has no idea how it all happened: "I'd love to say I'm responsible, but it's not true. . . . I'd have to say it was dumb luck."

Ultimately finding pure mathematics "an isolated, monastic kind of field," Lander, notably gregarious by the standards of his discipline, joined the jollier faculty of the Harvard Business School, but he soon found himself distracted and intrigued by the labors of his younger brother, a neuroscientist. Inspired, Lander taught himself biology by moonlighting in Harvard's and MIT's biology departments, all the while scarcely missing a beat on his day job at the B-school: "I pretty much picked up molecular biology on street corners," he says. "But around here, there are a lot of very good street corners." In 1989 he became a professor of biology on one of those street corners, MIT's Whitehead Institute.

Even among the so-called G5 – the public effort's five major centers, which also included the Sanger Centre, Washington University's Genome Sequencing Center, Baylor College of Medicine, and the DOE in Walnut Creek – Lander's lab would be the largest single contributor of DNA sequences. His team at MIT would also be responsible for much of the enormous acceleration of productivity in the home stretch leading up to the release of the draft genome (see Plate 43). On November 17, 1999, the public project celebrated its billionth base pair, with the sequencing of a G. Just four months later, on March 9, 2000, a T was base number 2 billion. The G5 was cranking. Because Celera was using the public project's data, which were posted immediately on the Internet and were now pouring in thick and fast, Venter, perhaps finally breaking a sweat, halved the amount of sequencing he had originally projected Celera would do.

As the public/private race reached a climax in the media, behind the barricades the focus was increasingly shifting to the effort's mathematical brain trust, scientists hidden in back rooms among banks of computers. They were the ones who had to make sense of all those As, Ts, Gs, and Cs of crude sequence. They had two major tasks. First: To assemble a whole final sequence from the many, many discrete chunks on hand. Most parts had been sequenced numerous times, so there were several genomes' worth of sequence to sort out, all of which had to be distilled down into a single canonical genome sequence. Computationally, this was an enormous undertaking. Second: To figure out what was what in the final sequence, and above all where the genes were. Identifying the genome's components – distinguishing between one stretch of As, Ts, Gs, and Cs that encoded nothing but junk and another that encoded a protein – depended on extremely computer-intensive approaches.

At the heart of Celera's computer operations was Gene Myers, the computer scientist who had been the WGS approach's first and most forceful advocate. With James Weber of the Marshfield Medical Research Foundation in Wisconsin, he had proposed that the public effort adopt WGS long before Celera even came into existence. And so for Myers, the success of Celera's bid was a point of pride and vindication.

Anchored as it was by previously mapped genetic landmarks, the public project's job in assembling the sequence, though immense, seemed less daunting than the one confronting Myers in the landmark-free world of WGS. (In its final analysis, Celera used the public project's freely available map information.) In fact, in counting on these very landmarks, the public project had rather underestimated its own computational challenge, so, as Celera added computer muscle, the public project stayed focused on gearing up its sequencing operation. Only very late in the day did the leaders of the public project realize that, despite the map, they, too, like the proverbial father facing the parts of a new bike on Christmas Eve, had a major assembly problem on their hands. A date for completion (and assembly) of the "rough draft" had been fixed for the end of June. But at the beginning of May, the public project still had no working means of assembling all their sequences. Their deus ex machina took a strange form: a graduate student from UC Santa Cruz.

His name was Jim Kent, and he looked like a member of the Grateful Dead. He had been programming computers since the beginning of the PC era, writing code for graphics and animations, but then decided on graduate school so he could be a part of bioinformatics, the new field dedicated to analyzing DNA and protein sequences. He realized that he was through with commercial programming when he received Microsoft's bulky twelve-CD-ROM package for developers of programs for Windows 95: "I was thinking to myself that the whole human genome could fit on one CD-ROM and didn't change every three months." Confident in May that he had a good way to crack the much-talked-about assembly problem, he induced his university to let him "borrow" 100 PCs recently bought for teaching purposes. He then embarked on a four-week programming marathon, icing his wrists at night to prevent them from seizing up as he churned out computer code by day. His deadline was June 26, when the completion of the rough draft was to be announced. The program finished
, he set his 100 PCs to work, and, on June 22, his gang of PCs solved the public project's assembly problem. Myers at Celera cut it even closer, completing his assembly on the night of June 25.

Then came June 26, 2000. Bill Clinton at the White House and Tony Blair at 10 Downing Street simultaneously proclaimed the first draft of the HGP complete. The race was called as a tie, the honors to be shared equally. Happily the opposing parties managed to put the bad feelings behind them, for the morning at least. Clinton declared, "Today, we are learning the language in which God created life. With this profound new knowledge, humankind is on the verge of gaining immense, new power to heal." Grand words for a grand occasion. It was impossible not to feel some pride in an accomplishment that the press promptly compared to the first Apollo moon landing, even if the "official" date of the triumph was somewhat arbitrary. The sequencing was by no means over, and it would be more than six months before the scientific papers summarizing the genome were published. It has been suggested that the timing was dictated not by the HGP's timetable but by Clinton's and Blair's schedules.

‹ Prev Next ›