As Richard Dawkins puts it, “All animals probably have a relatively similar repertoire of proteins that need to be ‘called forth’ at any particular time. . . .” The difference between a more complex organism and a simpler one, “between a human and a nematode worm, is not that humans have more of those fundamental pieces of apparatus, but that they can call them into action in more complicated sequences and in a more complicated range of spaces.” It was not the size of the ship, yet again, but the way the planks were configured. The fly genome was its own Delphic boat.
In May 2000, with Celera and the Human Genome Project sprinting neck and neck toward a draft sequence of the human genome, Venter received a phone call from his friend Ari Patrinos, from the Department of Energy. Patrinos had contacted Francis Collins and asked him to stop by Patrinos’s town house for an evening drink. Would Venter consider joining? There would be no aides, advisers, or journalists, no entourage of investors or funders. The conversation would be entirely private, and the conclusions would remain strictly confidential.
Patrinos’s phone call to Venter had been orchestrated for several weeks. News of the arms race between Celera and the Human Genome Project had filtered through political channels and reached the White House. President Clinton, with his unfailing nose for public relations, realized that news of the contest could escalate into an embarrassment for the government, especially if Celera was the first to announce victory. Clinton had sent his aides a note with a terse, two-word dictum—“Fix this!”—appended to the margin. Patrinos was the appointed “fixer.”
A week later, Venter and Collins met in the rec room in the basement of Patrinos’s town house in Georgetown. The atmosphere was understandably chilly. Patrinos waited for the mood to thaw, then delicately broached the subject of the meeting: Would Collins and Venter consider a joint announcement of the sequencing of the human genome?
Both Venter and Collins had come mentally prepared for such an offer. Collins had already agreed to a “joint revelation” (in part, he had instigated the meeting with Venter, using Patrinos as an intermediary). Venter mulled over the possibility and acquiesced—but with several caveats. He agreed to a joint ceremony at the White House to celebrate the draft sequence, and back to back publications in Science. He made no commitments about timelines. This, as one journalist would later describe it, was the most “carefully scripted draw.”
That initial meeting in Ari Patrinos’s basement room became the first of several private meetings between Venter, Collins, and Patrinos. Over the next three weeks, Collins and Venter warily choreographed the general outline of the announcement: President Clinton would open the event, followed by Tony Blair, and by talks from Collins and Venter. In effect, Celera and the Human Genome Project would be declared joint victors in the race to sequence the human genome. The White House was swiftly informed of the possibility of the announcement and acted quickly to secure a date. Venter and Collins returned to their respective groups and agreed on June 26, 2000.
At 10:19 a.m. on the morning of June 26, Venter, Collins, and the president gathered at the White House to reveal the “first survey” of the human genome to a large group of scientists, journalists, and foreign dignitaries (in truth, neither Celera nor the Genome Project had completed their sequences—but both groups had decided to continue with the announcement as a symbolic gesture; even as the White House was unveiling the supposed “first survey” of the genome, scientists at Celera and the Genome Project were keying frantically at their terminals, trying to string the sequence together into a meaningful whole). Tony Blair joined the meeting from London by satellite. Norton Zinder, Richard Roberts, Eric Lander, and Ham Smith sat in the audience, joined by James Watson in a crisp white suit.
Clinton spoke first, comparing the map of the human genome to the Lewis and Clark map of the continent:
“Nearly two centuries ago, in this room, on this floor, Thomas Jefferson and a trusted aide spread out a magnificent map, a map Jefferson had long prayed he would get to see in his lifetime. . . . It was a map that defined the contours and forever expanded the frontiers of our continent and our imagination. Today the world is joining us here in the East Room to behold the map of even greater significance. We are here to celebrate the completion of the first survey of the entire human genome. Without a doubt, this is the most important, most wondrous map ever produced by humankind.”
Venter, the last to speak, could not resist reminding his audience that this “map” had also been achieved, in parallel, by a private expedition led by a private explorer: “At twelve thirty today, at a joint press conference with the public genome effort, Celera Genomics will describe the first assembly of the human genetic code from whole genome shotgun method. . . . The method used by Celera has determined the genetic code of five individuals. We have sequenced the genome of three females and two males who have identified themselves as Hispanic, Asian, Caucasian, or African-American.”
Like so many truces, the fragile armistice between Venter and Collins barely outlived its tortuous birth. In part, the conflict centered on old quarrels. Although the status of its gene patents was still uncertain, Celera had decided to monetize its sequencing project by selling subscriptions to its database to academic researchers and pharmaceutical companies (big pharma companies, Venter had astutely reasoned, might want to know gene sequences to discover new drugs—especially ones that target particular proteins). But Venter also wanted to publish Celera’s human genome sequence in a major scientific journal—Science, for example—which required the company to deposit its gene sequences in a public repository (a scientist cannot publish a scientific paper for the general public while insisting that its essential data is secret). Justifiably, Watson, Lander, and Collins were bitingly critical of Celera’s attempt to straddle the commercial and academic worlds. “My greatest success,” Venter told an interviewer, “was I managed to get hated by both worlds.”
The Genome Project, meanwhile, was struggling with technical hurdles. Having sequenced vast parts of the human genome using the clone-by-clone approach, the project was now poised at a critical juncture: it had to assemble the pieces to complete the puzzle. But that task—seemingly modest on a theoretical level—represented a daunting computational problem. Substantial portions of the sequence were still missing. Not every part of the genome was amenable to cloning and sequencing, and assembling nonoverlapping segments was vastly more complicated than had been anticipated, like solving a puzzle containing pieces that had fallen into the cracks of furniture. Lander recruited yet another team of scientists to help him—David Haussler, a computer scientist at the University of California, Santa Cruz, and his forty-year-old protégé, James Kent, a former programmer–turned-molecular biologist. In a fit of inspired frenzy, Haussler convinced the university to buy a hundred desktop PCs so that Kent could write and run tens of thousands of lines of code in parallel, icing his wrists at night so that he could start coding every morning.
At Celera too the genome assembly problem was proving to be frustrating. Parts of the human genome are full of strange repetitive sequences—“equivalent to a big stretch of blue sky in a jigsaw puzzle,” as Venter described it. Computational scientists in charge of assembling the genome worked week upon week to put the gene fragments in order, but the complete sequence was still missing.
By the winter of 2000, both projects neared completion—but the communications between the groups, strained at its best moments, had fallen apart. Venter accused the Genome Project of “a vendetta against Celera.” Lander wrote to the editors of Science protesting Celera’s strategy of selling the sequence database to subscribers and restricting parts of it to the public, while trying to publish yet other selected parts of the data in a journal; Celera was trying to “have its genome and sell it too.” “In the history of scientific writing since the 1600s,” Lander complained, “the disclosure of data has been linked to the announcement of a discovery. That’s the basis of modern science. In pre-modern times, you cou
ld say: ‘I’ve found an answer, or I’ve made lead into gold, proclaim the discovery, and then refuse to show the results.’ But the whole point of professional scientific journals is disclosure and credit.” Worse, Collins and Lander accused Celera of using the Human Genome Project’s published sequence as a “scaffold” to assemble its own genome—molecular plagiarism (Venter retorted that the idea was ludicrous; Celera had deciphered all the other genomes with no help from such “scaffolds”). Left to its own devices, Lander announced, Celera’s data was nothing more than a “genome tossed salad.”
As Celera edged toward the final draft of its paper, scientists made frantic appeals for the company to deposit its results in the publicly available repository of sequences, named GenBank. Ultimately, Venter agreed to provide free access to academic researchers—but with several important constraints. Dissatisfied with the compromise, Sulston, Lander, and Collins chose to send their paper to a rival journal, Nature.
On February 15 and 16, 2001, the Human Genome Project consortium and Celera published their papers in Nature and Science, respectively. Both were enormous studies, nearly spanning the lengths of the two journals (at sixty-six thousand words, the Human Genome Project paper was the largest study published in Nature’s history). Every great scientific paper is a conversation with its own history—and the opening paragraphs of the Nature paper were written with full cognizance of its moment of reckoning:
“The rediscovery of Mendel’s laws of heredity in the opening weeks of the 20th century sparked a scientific quest to understand the nature and content of genetic information that has propelled biology for the last hundred years. The scientific progress made [since that time] falls naturally into four main phases, corresponding roughly to the four quarters of the century.”
“The first established the cellular basis of heredity: the chromosomes. The second defined the molecular basis of heredity: the DNA double helix. The third unlocked the informational basis of heredity [i.e., the genetic code], with the discovery of the biological mechanism by which cells read the information contained in genes, and with the invention of the recombinant DNA technologies of cloning and sequencing by which scientists can do the same.”
The sequence of the human genome, the project asserted, marked the starting point of the “fourth phase” of genetics. This was the era of “genomics”—the assessment of the entire genomes of organisms, including humans. There is an old conundrum in philosophy that asks if an intelligent machine can ever decipher its own instruction manual. For humans, the manual was now complete. Deciphering it, reading it, and understanding it would be quite another matter.
* * *
I. Stretches of DNA associated with a gene called promoters can be likened to “on” switches for that gene. These sequences encode information about when and where to activate a gene (thus hemoglobin is only turned on in red blood cells). In contrast, other stretches of DNA encode information about when and where to turn a gene “off” (thus lactose-digesting genes are turned off in a bacterial cell unless lactose becomes the dominant nutrient). It is remarkable that the system of “on” and “off” gene switches, first discovered in bacteria, is conserved throughout biology.
II. Venter’s strategy of sequencing protein-encoding and RNA-encoding portions of the genome would, in the end, prove to be an invaluable resource for geneticists. Venter’s method revealed parts of the genome that were “active,” thereby allowing geneticists to annotate these active parts against the whole genome.
III. Estimating the number of genes in any organism is complicated and requires some fundamental assumptions about the nature and structure of a gene. Before the advent of whole-genome sequencing, genes were identified by their function. However, whole-genome sequencing does not consider the function of a gene; it is like identifying all the words and letters in an encyclopedia without reference to what any of these words or letters mean. The number of genes is estimated by examining the genome sequence and identifying stretches of DNA sequence that look like genes—i.e., they contain some regulatory sequences and encode an RNA sequence or resemble other genes found in other organisms. However, as we learn more about the structures and functions of genes, this number is bound to change. Currently, worms are believed to have about 19,500 genes, but that number will continue to evolve as we understand more about genes.
The Book of Man (in Twenty-Three Volumes)
Is man no more than this? Consider him well.
—William Shakespeare, King Lear, act 3, scene 4
There are mountains beyond mountains.
—Haitian proverb
• It has 3,088,286,401 letters of DNA (give or take a few).
• Published as a book with a standard-size font, it would contain just four letters . . . AGCTTGCAGGGG . . . and so on, stretching, inscrutably, page upon page, for over 1.5 million pages—sixty-six times the size of the Encyclopaedia Britannica.
• It is divided into twenty-three pairs of chromosomes—forty-six in all—in most cells in the body. All other apes, including gorillas, chimpanzees, and orangutans, have twenty-four pairs. At some point in hominid evolution, two medium-size chromosomes in some ancestral ape fused to form one. The human genome departed cordially from the ape genome several million years ago, acquiring new mutations and variations over time. We lost a chromosome, but gained a thumb.
• It encodes about 20,687 genes in total—only 1,796 more than worms, 12,000 fewer than corn, and 25,000 fewer genes than rice or wheat. The difference between “human” and “breakfast cereal” is not a matter of gene numbers, but of the sophistication of gene networks. It is not what we have; it is how we use it.
• It is fiercely inventive. It squeezes complexity out of simplicity. It orchestrates the activation or repression of certain genes in only certain cells and at certain times, creating unique contexts and partners for each gene in time and space, and thus produces near-infinite functional variation out of its limited repertoire. And it mixes and matches gene modules—called exons—within single genes to extract even further combinatorial diversity out of its gene repertoire. These two strategies—gene regulation and gene splicing—appear to be used more extensively in the human genome than in the genomes of most organisms. More than the enormity of gene numbers, the diversity of gene types, or the originality of gene function, it is the ingenuity of our genome that is the secret to our complexity.
• It is dynamic. In some cells, it reshuffles its own sequence to make novel variants of itself. Cells of the immune system secrete “antibodies”—missilelike proteins designed to attach themselves to invading pathogens. But since pathogens are constantly evolving, antibodies must also be capable of changing; an evolving pathogen demands an evolving host. The genome accomplishes this counterevolution by reshuffling its genetic elements—thereby achieving astounding diversity (s . . . tru . . . c . . . t . . . ure and g . . . en . . . ome can be reshuffled to form an entirely new word c . . . ome . . . t). The reshuffled genes generate the diversity of antibodies. In these cells, every genome is capable of giving rise to an entirely different genome.
• Parts of it are surprisingly beautiful. On a vast stretch on chromosome eleven, for instance, there is a causeway dedicated entirely to the sensation of smell. Here, a cluster of 155 closely related genes encodes a series of protein receptors that are professional smell sensors. Each receptor binds to a unique chemical structure, like a key to a lock, and generates a distinctive sensation of smell in the brain—spearmint, lemon, caraway, jasmine, vanilla, ginger, pepper. An elaborate form of gene regulation ensures that only one odor-receptor gene is chosen from this cluster and expressed in a single smell-sensing neuron in the nose, thereby enabling us to discriminate thousands of smells.
• Genes, oddly, comprise only a minuscule fraction of it. An enormous proportion—a bewildering 98 percent—is not dedicated to genes per se, but to enormous stretches of DNA that are interspersed between genes (intergenic DNA) or within genes (introns). These long stretch
es encode no RNA, and no protein: they exist in the genome either because they regulate gene expression, or for reasons that we do not yet understand, or because of no reason whatsoever (i.e., they are “junk” DNA). If the genome were a line stretching across the Atlantic Ocean between North America and Europe, genes would be occasional specks of land strewn across long, dark tracts of water. Laid end to end, these specks would be no longer than the largest Galápagos island or a train line across the city of Tokyo.
• It is encrusted with history. Embedded within it are peculiar fragments of DNA—some derived from ancient viruses—that were inserted into the genome in the distant past and have been carried passively for millennia since then. Some of these fragments were once capable of actively “jumping” between genes and organisms, but they have now been largely inactivated and silenced. Like decommissioned traveling salesmen, these pieces are permanently tethered to our genome, unable to move or get out. These fragments are vastly more common than genes, resulting in yet another major idiosyncrasy of our genome: much of the human genome is not particularly human.
• It has repeated elements that appear frequently. A pesky, mysterious three-hundred-base-pair sequence called Alu appears and reappears tens of thousands of times, although its origin, function, or significance is unknown.
• It has enormous “gene families”—genes that resemble each other and perform similar functions—which often cluster together. Two hundred closely related genes, clustered in archipelagoes on certain chromosomes, encode members of the “Hox” family, many of which play crucial roles in determining the fate, identity, and structure of the embryo, its segments, and its organs.
The Gene Page 38