The damaging biological effects of ionizing radiation were discovered by the American biologist Herman Joseph Mueller in 1926. Work done by Mueller and others in the early days of studying X rays, as well as natural and artificial radioactivity, has helped us understand mutations that occur spontaneously due to cosmic radiation from our sun and many other celestial sources dating back to the dawn of time. These mutations occur despite the protective effects of the atmosphere (equivalent to about 10 meters of water) and the Van Allen belts, two regions of charged particles that partly surround the earth at heights of several thousand kilometers.
The unwelcome consequences of the atomic revolution include nuclear war and radiation sickness, the Chernobyl meltdown, and the possibility of worldwide nuclear holocaust. Negative aspects of chemistry include pollution, drug abuse, and the problems posed by semiconductors (a subject covered in my discussion of the sixth revolution), computer viruses, identity theft, privacy invasion, cyberwar, and bioterror.
The original Human Genome Project sequenced the 3 billion base pairs of the human genome at a cost of $3 billion, or at an average cost of one dollar per base pair. That was a milestone of science, but its significance was offset by two factors: its high cost and the fact that the genome that had actually been sequenced was not the DNA of any one individual but a composite genome of many DNA contributors. It was the sequence of a “blended” person, and so it had little value in practical, personal, or medical terms. It was the moon landing of molecular biology. (The genomes of two individual humans differ by an average of about 3 million positions, which is approximately 0.1 percent of the total. Most of these are single base changes or changes in tandem repeat lengths.)
Although the genomes of any two people are 99.9 percent identical, the genetic differences between them account for much of their physical uniqueness, including predisposition to illness and differential responses to drugs, medical treatments, and infectious disease agents, as well as their psychological individuality and personal tastes, preferences, talents and deficiencies.
Hitherto, medicine has operated largely on a one-size-fits-all approach, tailoring a cure to the disease rather than to the person who is suffering from it. For a long time this made sense: a disease, after all, is a specific thing and human beings are genetically almost carbon copies of each other, and so what alleviates a disease for one person ought to perform equally well for the next. But often enough it doesn’t. A drug that helps one person may be toxic to another, may provoke an allergic reaction or have other adverse side effects, or may have no effect whatsoever. Such differential responses are often found with respect to antidepressant medications, for example, many of which can take two weeks or more to have an effect, if any. Discovering a genetic basis for these varying outcomes would allow doctors to prescribe drugs that worked most effectively for a given person. Indeed, such discoveries are now laying the groundwork for the new field of pharmacogenomics.
People also respond differently to the same disease agent. The bacterium Staphylococcus aureus, for instance, kills an average of 100,000 Americans per year, more than any other single microorganism. It is the leading cause of heart, skin, and soft tissue infections, and is a common cause of pneumonia. It is the top causal agent of nosocomial (hospital-acquired) infections. Nevertheless, some 30 percent of the population harbor the pathogen in their nasal passages but show no sign of infection. Evidently there are genetic factors at work that explain these dissimilar responses to the microbe. This finding has implications for the future of medicine. If a patient’s genome sequence were part of his or her electronic medical record, and susceptibility to Staph infection was known in advance, then the subject could be treated with appropriate antibiotics before being admitted to a hospital where the infection might otherwise be acquired.
The original Human Genome Project was made possible by the then emerging niche technology of automated DNA sequencing machines. Ten years after the success of the HGP, improvements in that technology have brought down costs to levels at which commercial personal genomic services have become a reality, and one day a complete human genome sequence will be available for about $1,000. This will inaugurate the era of new approaches to health and disease, an era of personalized genomic medicine.
As a teenager, I had the grand notion that we ought to sequence everybody—all 6 billion base pairs for all the 4 to 7 billion of us—and store the data in computers. This was a sort of “genomes for all” approach, to be pursued for predictive reasons alone. The idea was that if you knew the types of diseases or medical conditions you were predisposed to—adult-onset diabetes, let’s say—then you could take appropriate countermeasures early in life. Given the cost of both sequencing technology and computers in those days, that plan was naive, to say the least.
But today, when the cost of computers and automated DNA sequencing technology continues to plummet, my plan is not so naive. Whereas the data storage capacity and processing speeds of computers has tended to follow Moore’s Law over the past fifty years, with the number of transistors on integrated circuits doubling every year and a half, the cost-effectiveness of DNA sequencing has increased by about ten times per year over the last six years.* Such improvements are only likely to accelerate, and consequently the sequencing of whole populations at low cost soon will be possible.
I still think that, ideally, all who desire it should be able to have their genome sequenced, and for predictive reasons first and foremost. After all, there are already about two thousand known, actionable, and highly predictive genetic associations. Even though they may be rare, they are nevertheless predictable and actionable—conditions that you can do something about.
Another reason to sequence everybody is to create a database that shows correlations between genotype and phenotype—between a person’s genome and the set of observable characteristics that result from the interaction of the person’s genome with the environment. (As geneticists like to say, the genes may load the gun but the environment pulls the trigger.) There are correlations not only between genes and diseases but between genes and observable traits such as eye color, hair color, facial features, cognitive abilities, eating habits, lifestyle, personal history and experiences, career choices, mental outlook, and lots of other things. The genomic database would be an immense toolbox for understanding the myriad ways in which genes and the environment interact to form the sum total of human individuality and variability.
This dataset would have to be fully open to researchers, to any investigator who wanted to use it for any purpose whatsoever, whether to generate hypotheses, run tests, establish or disprove correlations on any level, or anything else. That would mean open publication of the individual’s genome and phenotype on the Internet, available for all the world to see. In effect it would be putting your life story, medical history, and genetic makeup on the web. It would be the Facebook of DNA.
This brings us to the sixth industrial revolution—the information-genomics revolution. Like the others, this revolution has crucial quantitative measures—probability, information, and complexity—as well as possibly crucial emerging measures of life, evolution, and intelligence. Theories of probability began as ruminations on how to win at games of chance, as illustrated by the Liber de ludoaleae (Book on Games of Chance), written by Gerolamo Cardano in 1526. Information and communication theory builds on probability theory. The basic concepts were first clearly articulated by Claude Shannon in his paper “A Mathematical Theory of Communication” (1948). One of Shannon’s goals was to quantify the amount of information lost to static and other influences in phone line signals. He called this measure of uncertainty “information entropy” as an analogue to entropy in thermodynamics, where it refers to the irreplaceable loss of heat energy.
John von Neumann, the mathematician and sometime wit, liked Shannon’s term, telling him: “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that
name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage” (This was a concept with legs, as the idea was later exported to the business world as “corporate entropy”—energy lost to bureaucracy and red tape.)
This sixth revolution is allowing us to understand, generalize, and make connections to the previous ones. For example, the precise sequences of DNA, RNA, and proteins discussed in Chapter 1 have a great deal in common with the strings of zeros and ones of the digital computing revolution.
Some unwelcome consequences of the sixth revolution include computer viruses, identity theft, privacy invasion, cyberwar, and bioterror. The potential scale of cyberwar became evident recently in the extent to which the Iranian computers controlling isotope separations could be compromised by individuals having no physical access to the devices (possibly Israeli sympathizers). The cost to society of hacking, Trojan horses, and computer viruses and worms is about $50 billion per year. Bioterrorism is currently small to nonexistent, but the stakes are astronomically high.
The original goal of the Personal Genome Project (PGP) was to sequence the genomes of 100,000 volunteers at no cost to them, and to publish the results on the Internet along with each individual’s personal data, even down to their picture. Of course, any such plan immediately raises privacy issues. The PGP couldn’t promise to keep any of this information private, since the whole point of the exercise was to create an open access source of genetic and personal information and to disseminate it as widely as possible.
The solution was to accept into the program only people who, like myself, consider that the benefits to society outweigh the risks—and also regard privacy as a highly overrated asset. But then the question was, How could we guarantee that we’d recruit only such people? The answer was to formulate a list of eligibility criteria, create consent forms, and devise an online entrance exam composed of a few dozen questions that would measure each volunteer’s understanding and acceptance of the totally free and open-access nature of the enterprise. A potential recruit would have to give correct answers to each and every question, the equivalent of getting a score of 100 percent on an examination. The candidate could take the test as many times as necessary to get 100 percent, but a perfect score would be required for enrollment in the program.
In addition to passing the entrance exam, participants would be required to sign two consent forms: a five-page mini consent form outlining the program and its requirements and constituting a basic eligibility screening. A second and more elaborate sixteen-page full consent form would describe the program, the candidate’s participation, and public release of the data generated, in great and heroic detail.
All this was acceptable to Harvard’s Institutional Review Board (IRB), the body responsible for approving, monitoring, and reviewing medical research or experimentation on humans. In August 2005, the IRB gave us approval for a pilot program. I became the first candidate.
Today, anyone can go to PersonalGenomes.org and view my public profile, which includes vital signs (my height, weight, and blood pressure), allergies (none), medications (lovastatin, coenzyme Q, multivitamins, calcium, etc.), medical history (narcolepsy and squamous cell carcinoma, among other fun things), race (white), traits (male, blood type O+, green eyes, etc.), facial photographs (suitable for framing), DNA data sets, and type of tissue samples taken (lymphoblasts and fibroblasts), plus date collected, storage location, and accession number. All of this information is followed, furthermore, by a universal waiver, stating in part: “To the extent possible under law, PersonalGenomes.org has waived all copyright and related or neighboring rights to Personal Genome Project Participant Genetic and Trait Dataset.”
My tissue samples were taken in 2005 and 2006. My lab has developed or advised most of the current thirty-six commercial next-generation sequencing technologies, and we test these technologies as they mature. The first set of samples was sequenced at Complete Genomics in Mountain View, California.
The results were underwhelming. My genome should have shown alleles for narcolepsy, dyslexia, high cholesterol, cardiac arrhythmia (and maybe musical arrhythmia as well!), squamous cell carcinoma, and plantar fasciitis. But it didn’t (yet). Sequencing a genome is one thing, but interpreting and understanding it—making sense out of the practically endless and visually meaningless strings of the nucleotide letters A, T, C, and G—is quite another. Doing so requires the use of software that translates those otherwise baffling chains of letters into usable, practical information. The process of developing such software is still in its early stages. In December 2010 the University of California-Berkeley hosted a competition for genome interpretation programs. It was called the inaugural Critical Assessment of Genome Interpretation (CAGI) competition and attracted more than one hundred entrants. The very existence of this competition shows that we have a long way to go before a genome sequenced is a genome understood.
The Personal Genome Project was formally opened to the general public on DNA Day of 2009, April 25, the anniversary (recognized by the U.S. Congress) of the day in 1953 that Watson and Crick’s paper describing DNA’s structure was published in Nature. The first ten participants, myself and nine others, became known as the PGP-10. The Harvard IRB initially wanted the first ten candidates to have at least a master’s degree in genetics, but the board later dropped this requirement as impractical for subsequent scaling up. In the end, the PGP-10 included Esther Dyson (PGP-3), who describes herself as “a longtime catalyst of start-ups in information technology,” Steven Pinker (PGP-6), a Harvard psychologist, and Misha Angrist (PGP-4), an assistant professor at Duke, the only one of the PGP-10 who has a PhD in genetics.
In 2009 Pinker wrote a first-person account of his participation in the project for the New York Times Magazine, “My Genome, My Self.” The piece described a decision that every potential PGP participant had to face: whether or not to learn that you may be carrying the gene for an incurable disease such as Huntington’s or Alzheimer’s. Pinker chose not to learn whether he had a variant of the APOE gene that would predispose him to Alzheimer’s disease. (This is known as “redacting” the information in question.)
He did learn that he had one copy of a gene for familial dysautonomia, an incurable disorder of the nervous system with unpleasant consequences, including premature death. “A well-meaning colleague tried to console me,” Pinker wrote, “but I was pleased to gain the knowledge”
His other genomic discoveries included good news and bad news. The good news was that he had a less than average chance of getting prostate cancer before age 80. The bad news was that he had a slightly elevated chance of developing type 2 diabetes. (He had a risk of baldness, despite the fact that he had great hair.)
Pinker’s results show that genetics is not always destiny. Some genes are deterministic. If you have the gene for Huntington’s disease and you live long enough, you will sooner or later develop it. Otherwise, the influence of genes on traits is probabilistic, stacking the odds in one direction or another but without completely predetermining the outcome. Better yet, since there is a strong environmental component of our genetic destiny, we can take action to influence or defeat it.
Another of the PGP-10 also wrote about the journey through his personal genomic universe. This was Misha Angrist (PGP-4), who published the story in his book, Here Is a Human Being: At the Dawn of Personal Genomics (2010). His data had been interpreted by the Trait-o-matic, opensource software developed at the Church lab for the purpose of finding and classifying the ways in which genetic sequence variations manifest themselves in the human body.
When Angrist logged on to the website holding this personal genomic information, he too discovered a modest number of unimpressive genetic data points about himself. Those data points, however, are now part of a growing body of openly accessible set of genotypic-phenotypic correlations, a sort of medical analogue to the World Wide Web.
There is a differen
ce worth noting between the Personal Genome Project and the group of private companies that do over-the-counter genome analysis for the masses. The genomic information provided by the Personal Genome Project is useful both to the individual sequenced as well as to the wider community of researchers, with the dual goals of transforming the practice and delivery of medicine and understanding how genomes give rise to living beings and how they influence the manifold processes of life.
Commercial genome sequencing firms, like Knome, by contrast, hold their data privately, and for good reason, for the information they acquire could be put to a number of unpleasant uses. For example, it could be used to infer paternity, affect employment or insurability, or even one’s love life. Ironically, the information often is not used by the person for whom it was intended. A study conducted by Scripps Health, of La Jolla, California, and published in the New England Journal of Medicine in 2011, reported that out of 2,037 people whose genomes were analyzed by the private firm Navigenics, most of them had failed to make any changes in their diet or exercise patterns when they were interviewed three months after receiving their test results, even when their results showed a definite need for making such changes. (Still, 27 percent of the participants who shared their results with their physicians did make some lifestyle changes.)
Regenesis Page 23