Here Is a Human Being

Home > Other > Here Is a Human Being > Page 16
Here Is a Human Being Page 16

by Misha Angrist


  Did researchers believe their own informed-consent forms and did subjects understand them?

  Should research results be returned to research subjects?

  How should investigators share data, and with whom?

  Projects like the PGP raised the larger question: Can participants consent to uncertainty? To paraphrase Donald Rumsfeld, could we accept not knowing what we did not know? Clearly the ten of us thought we could; we also thought that anyone who was comfortable with the unknown should be allowed to assume those risks. NIH, on the other hand, was clearly not down with the idea; governments and regulatory agencies liked certainty. McGuire did not take a stand—"All of these approaches have merit,” she said.9 But unlike the rest of us, she had actually walked the walk: she was the one who had navigated the ethical path for the release of Jim Watson’s genome to the world. Watson was, for the most part, like the PGP-10: he was prepared to let it all hang out—to deposit his genome into a public database and let the world have unfettered access to his gene sequences and the rest of his DNA. With one exception.

  In the 1990s, scientists at Duke University (disclosure: my employer since 2003) discovered that the apolipoprotein E (APOE) gene was a major risk factor for garden-variety, late-onset Alzheimer’s disease that some 5 million Americans are living with (there are rare, purely genetic forms as well). One copy of the APOE4 version of the gene put you at threefold higher risk of developing Alzheimer’s. Two copies and you were really in trouble: by age eighty-five, more than 50 percent of people with two copies developed Alzheimer’s.10 Would people really want to know if they were at such high risk for a disease they could do almost nothing to treat or prevent?

  Watson had made his stance clear in the press and also during our interview at Cold Spring Harbor. “My Irish grandmother died of Alzheimer’s at eighty-three,” he said. “I don’t want to worry that every lapse in memory is the start of something. I’m not afraid of the future, but I don’t want to know. Of course, I could be homozygous APOE4 and still not get Alzheimer’s, so … it’s complicated.”11

  Indeed it was. After McGuire’s talk, questioners from the audience lined up in the aisles. Among them was a fresh-faced young man in shorts and a T-shirt. This was Mike Cariaso, who, according to a speaker bio I read for a later meeting, “enjoys travel, skateboarding, reading genomes, and programming in Python.”12 When he got to the mic, Cariaso informed McGuire—and the rest of the audience—that it was possible, and in fact quite easy, to infer Watson’s genotype using his available DNA sequence data from one and/or both sides of the APOE gene.

  Geneticists have given this phenomenon the rather unwieldy name of linkage disequilibrium. What it basically means is that even though parts of each parent’s set of chromosomes* get exchanged when sperm and egg come together (recombination), the process is not entirely random: two genes that are right next door to each other on a chromosome tend to travel together through the generations. Genes that are on the same chromosome but farther apart are more likely to be separated during recombination—they are less tightly linked. In order to prevent Watson’s APOE status from being known, Baylor had redacted or “scrubbed” his APOE sequence and some amount of DNA on either side. But it wasn’t enough. Cariaso was still able to use the more distant sequence: by knowing which versions of the SNPs on either side of APOE Watson inherited, Cariaso could check the genome databases and see what version of APOE tended to travel with the more distal parts of the chromosome that hadn’t been scrubbed. This type of deduction—comparing what’s known in other cases to an unknown case like Watson’s—is how linkage disequilibrium works.

  “Watson may not know his APOE genotype, but I do,” Cariaso told McGuire in front of the stunned crowd. “And if anyone else wants to know, the information is still on the [National Center for Biotechnology Information] server.”13

  He returned to his seat. Someone in the audience, an aghast and agitated Baylor genomicist I imagined, her face pale, marched up to him and began firing questions, each of which Cariaso answered with quiet confidence. The facts were inescapable: The Nobel Prize–winning and DNA-discovering source of the second completely sequenced human genome had asked that of his 20,000+ genes, sequence information from just one lousy gene—one!—not be made public. This task was left to the molecular brain trust at Baylor University, one of the top genome centers in the world. Its mission was to keep secret a single genotype from a single gene. But the Baylor team was outfoxed by a thirty-year-old autodidact with a bachelor’s degree who preferred to spend most of his time on the Thai-Burmese border distributing laptops and teaching kids how to program and perform Google searches.

  If there were any remaining doubts as to the relatively easy availability of Watson’s APOE status, they were erased a few months later when Australian researchers came to the same conclusion as Cariaso. The title of their paper in the European Journal of Human Genetics said it all: “On Jim Watson’s APOE Status: Genetic Information Is Hard to Hide.”14

  After I got home I asked Mike via email what he made of the minor shitstorm he had started at Marco Island. He wrote back: “The ethical conundrum is: What did Watson intend not to know? Was it: 1. ‘Don’t tell me my APOE sequence'; 2. ‘Don’t tell me my ApoE4 [trait] status'; 3. ‘Don’t tell me anything that might reveal my ApoE4 [trait] status'; or 4. ‘Don’t tell me anything that predicts Alzheimer’s?’

  “Number 1 and Number 2 were addressed. Number 4 is impossible, since it’s based on what we might discover tomorrow. Given the best data we have today, we know that Number 3 wasn’t covered due to the high linkage disequilibrium with a distant neighbor [on the same chromosome]. If they’d scrubbed APOE [plus another] 30,000 base pairs on either side, then they would have covered what we know today. But that doesn’t mean tomorrow we won’t learn a new way of determining [APOE genotypes] from some sequence 50,000 base pairs away or even on a different chromosome. It’s tough to guard against the future.”15

  Mike and his friend Greg Lennon run SNPedia, a wiki-based website that is in some ways the do-it-yourself version of 23andMe. Sort of. Mike, Greg, and anyone else who wants to can “simply” dig through the human genetics literature and look for associations between genetic variants and human traits. They catalog these and write brief narrative descriptions of them:

  rs6457617 has been reported in a large study to be associated with rheumatoid arthritis. This SNP is reported to be the most statistically significant of many SNPs similarly located in the MHC region. The risk allele (oriented to the dbSNP entry) is (T); the odds ratio associated with heterozygotes is 2.36 (CI 1.97–2.84), and for homozygotes, 5.21 (CI 4.31–6.30).16

  What the hell does this mean? To start with, “rs6457617” is the SNP number; that is, it is the unique identifier of a particular variant in human DNA (“rs” stands for “Reference SNP,” one that has been validated and mapped to a particular place in the genome). Now, recall the DNA alphabet: A, G, C, and T. Our genomes are the 3 billion A’s, G’s, C’s, and T’s we get from our mothers and the 3 billion we get from our fathers. Of the thousands of people who have been screened for this particular SNP associated with rheumatoid arthritis, virtually everyone on earth is one of the following: CC, CT, TC, or TT. People who inherited a C allele at this SNP from each parent are at average risk for developing rheumatoid arthritis. People who inherited a T from one parent and a C from the other at this position are 2.36 times more likely to develop RA than average. People who inherited a T from both parents (as I did) are 5.21 times more likely to develop RA.

  Okay, but what does that mean in absolute terms? We don’t know with 100 percent accuracy, but after typing me for this SNP and five others associated with arthritis, Navigenics told me that my lifetime risk of developing RA was 2.8 percent, or a little less than twice the average. The first big caveat: from twin studies, we know that only slightly more than half of the risk for RA is inherited; the rest is likely due to environmental factors, which Navigenics has not measured (nor, as far
as I know, have any of the other commercial or noncommercial genome scanners, probably because no one knows exactly what they are).17 The second big caveat: there’s no reason to think that scientists won’t find another dozen SNPs in the human genome that contribute to RA susceptibility. The model for risk prediction in RA will probably look much different in a few years (if not months) and it will probably be much more complicated.

  But for Mike Cariaso and Greg Lennon, that wasn’t the point. Over a long lunch at a cheap French restaurant in a nondescript part of Bethesda, Maryland, near the hotel where NIH likes to hold meetings, Greg recounted the genesis of SNPedia.18 He took me back to 2005–2007, a simpler time when there were no direct-to-consumer genomics companies or gargantuan databases brimming over with information on human genomic variation. This state of affairs didn’t sit right with Lennon, a handsome man in his early fifties with thinning gray hair and bright blue eyes. He spoke in relaxed, measured tones, although one sensed impatience just below the surface. In the early 1990s he had finished his postdoc with übergeneticist Hans Lehrach at the Imperial Cancer Research Fund in London and had followed that with a successful career as a biotech scientist and executive. He and Cariaso met when both were working at Larence Livermore in northern California. Cariaso then followed Lennon to Gene Logic, one of the first companies to take seriously the idea that gene expression—to what extent certain sets of genes were active in particular cells and tissues—could be used to identify drug targets. Thus, for example, white blood cells express high levels of genes that code for infection-fighting proteins; neurons express high levels of genes that code for neurotransmitters such as dopamine; and so on. To a large extent, different cell types can be defined by the genes they do or don’t express. (Alas, thus far this has not led to much in the way of drugs.)

  The lingua franca for measuring gene expression in the late 1990s and early 2000s was the microarray: tiny spots of DNA fixed to a solid surface such as a glass microscope slide or nylon membrane. A microarray typically contains thousands of genes. To survey the expression of those genes in a cell or tissue with a microarray, one would prepare fluorescently labeled RNA (the intermediate coded for by DNA that usually goes on to code for protein) from that cell type. All of those bits of RNA serve as probes: they find their complementary DNA partners and stick to them like molecular Velcro. When they find their match, they fluoresce. The strength of the fluorescent signal that lights up at the spot in the array representing each individual gene provides a snapshot of how active that gene is in the sample. Genes that are especially active or inactive in diseased cells and tissues are potential drug targets.

  But by 2000, microarrays were beginning to be used for purposes beyond gene expression. Among the new applications was SNP detection, and this was even easier than gene expression. Instead of a curve measuring the extent to which a gene was expressed, genotyping was binary: In any given individual, was a particular DNA variant present or absent? And if present, was there one copy or two? By doing case-control studies on hundreds or thousands of people, say, half with a certain disease and half without, and by finding SNPs that were more frequently found in those with the disease, SNP scans could be used to identify disease susceptibility genes.

  The leap Cariaso and Lennon made was to take those findings and begin to apply them to individuals. Because once those genome-wide association studies (GWAS) were done on one or more populations, then hypothetically anyone could examine some fraction of her complement of SNPs and see whether she carried SNPs that raised or lowered her disease risks or otherwise contributed to her traits. In 2007 this was still an expensive, labor-intensive, and high-risk proposition for two guys in their basements. So why even bother?

  “I was frustrated,” Greg Lennon told me. “I could go into any restaurant and ask, ‘Has anyone here benefited from the Human Genome Project? Do you know anything about your own genetics? Do your doctors know anything about it?’ The answer in general would be—and for the most part, still is—a resounding no. Not even at the level of cocktail banter. I spent my career in an area of science that I felt and continue to feel is very promising. Yet it doesn’t seem to matter. It hasn’t affected anybody. So I began to ask myself, ‘Have I just wasted my time?'”19

  After talking about it for more than a year, Lennon and Cariaso decided to go native. From Gene Logic’s deep and abiding work on gene expression, the two were quite familiar with Affymetrix GeneChips, the dominant microarray platform at the time.20 They had both done some genotyping. So how hard could it be to run some Affymetrix chips on themselves and have a look at their own genomes?

  Arguably the pair’s most difficult hurdle turned out to be getting DNA out of their own bodies. Spit kits and spit parties were still many months away. Blood was easier and cheaper to process. But there was a problem. “When you’re a random individual wandering the streets,” Lennon said, “no one really wants to collect blood from you.”21 Cariaso stopped in a fire station in suburban Maryland and found that paramedics have a lot of time on their hands between calls. He chatted them up, got them interested in what he was doing, and soon had his sleeves rolled up. And that would have been that, except … “Sitting in the back of an ambulance with the needle in my arm, the station alarms went off.”22 Another possibility was the Red Cross: Lennon told me that if you had “the right attitude,” then you could get some of your own blood to go. He wound up getting his own blood sample from a general practitioner who had a soft spot for human genetics, despite the admission that he remembered almost nothing from his cursory medical school training in the subject.23

  Lennon and Cariaso found a contract molecular biology lab to isolate DNA from their white blood cells and to run the latest and greatest Affy chip (five hundred thousand markers). According to Lennon, that’s when the fun began. “We got that data back, and for all of our brilliance, we just stared at those huge files going, ‘Now what?’”24

  They were convinced that at some point during the thirteen-year, $2.7 billion effort to map the human genome, surely someone had taken the initiative and developed a database that systematically linked variation across the genome to human phenotypes. There was Online Mendelian Inheritance in Man,25 an incredibly useful tool developed by the late Victor McKusick, the father of clinical and medical genetics. The catalog began as a hardcover book, Mendelian Inheritance in Man, in 1966.26 But even though it now lived online, OMIM was a text-based catalog, not a digital one, and it mostly contained diseases and phenotypes caused by rare changes in single genes: if a doctor in Saudi Arabia observed a child with widely spaced eyes and elevated enzyme levels in 1968 and published a case report, McKusick would make a note of it. The catalog was and is remarkably comprehensive: an astounding collection of our species’ variation.27 That said, I often found reading OMIM to be annoying: helpful and fascinating case reports and research studies were amassed under gene and/or disease headings and subdivided (“clinical features,” “animal models,” “pathogenesis,” etc.), but without any real narrative flow. At its best, it was like an Audubon field guide for clinicians—a terrific, handy reference. At its worst, it could be a painful slog for anyone interested in a high-level view of any particular genetic disease. I would be terribly upset if it didn’t exist, but as the great Irish writer Roddy Doyle said of Ulysses, it might benefit from a little more editing.28

  The public database of genetic variants, dbSNP, began in 1999. In 2002 it contained 1.3 million unique, validated human variants. In early 2010 it had 9.5 million.29 But unlike OMIM, dbSNP had no intrinsic clinical content, and until recently it didn’t “talk” to OMIM. “I respect McKusick and the way he put OMIM together,” said Lennon. “But that doesn’t mean it’s kept up with the times. It doesn’t help you annotate your genome. The vast majority of the information is effectively anecdotal. There’s nothing wrong with that. But you can’t actually have software work with OMIM. At least dbSNP had the nomenclature part of it roughly right.”30

  Disappointed t
hat the billions spent on the Human Genome Project had not resulted in a resource linking genotype to phenotype, Lennon and Cariaso were at a loss. Despite their ambition, the idea that the two of them could take all of the world’s human genetics literature and turn it into useful information for the benefit of a dozen or a thousand people was laughable. The idea that any number of people could do it was debatable. Lennon and Cariaso could pour the foundation, but other people would have to finish the floors and furnish the house. It would take a village … or at least, something like Wikipedia. And that was the approach they took: the two put other people’s SNP data up on the site (they never got around to posting their own) and let the world have at it.

  “We faced and still face the exact same questions that Wikipedia faces,” said Lennon. “How do you control quality? Is the information credible? Fine—those are fair things to ask. Why not be skeptical about anything you read? On the other hand, we have a huge advantage over Wikipedia because we’re not trying to cover everything from the Israeli-Palestinian conflict to Britney Spears. And we don’t get a lot of flame wars on the site. Genetics is usually pretty boring.”31

  Nor was SNPedia trying to turn a profit like the personal genomics companies. It has been free and open-source from the beginning. It does not perform experiments and it does not offer its own interpretations. It simply mines the literature and reports conclusions drawn by others.

  After Marco Island, I got my Affy500 SNP data—five hundred thousand markers—from George and sent it on to Mike. All three of us were gung ho, perhaps a little too much so. George was keen to send it to me because the PGP was still trying to figure out the interpretation part of the equation and he thought maybe I could do some of the legwork to see what was out there. Mike was keen to get his paws on PGP data and anything else people were willing to share—more data points for SNPedia could only help the site to grow and be taken more seriously. And I was jazzed because I would finally get a glimpse of my own genome. Mostly jazzed anyway:

 

‹ Prev