When assembling genomes from living species, the best way to assemble these trickiest regions is to sequence very long fragments of DNA. By long fragments, I mean fragments that are thousands to hundreds of thousands of base-pairs long. This is hard to do, and enormous biotech budgets are spent each year trying to solve this problem. Unfortunately, ancient DNA does not survive in long strands—our fragments mostly contain fewer than one hundred base-pairs and frequently very many fewer than that. So, even if breakthroughs in sequencing technologies emerge over the next few years that allow very long fragments to be sequenced, these will be of little use in sequencing the genomes of extinct species.
The good news is that the cost of sequencing DNA continues to decrease, which means that we can generate more and more sequences from each ancient sample without completely breaking the bank. Also, our ability to recover DNA from fossils is improving. While these sequence fragments will always be short, the quantity of recoverable fragments will increase. We might also get lucky and find ancient samples—preserved in frozen arctic soils, for example—that do retain fragments of DNA that are many hundreds of base-pairs long, although we are extremely unlikely ever to find remains that retain fragments that are thousands or tens of thousands of base-pairs long. Finally, computational approaches to piecing together fragments of DNA without a closely related guide genome are also improving, which allows better assemblies of ancient genomes from increasingly divergent species.
The truth, however, is that zero mammalian genomes are sequenced entirely to completion. This includes the human genome, although the ecstatic claims of having done so more than a decade ago would certainly suggest otherwise. The truth is that there are some regions of the human genome that remain unsequenced to this day and cannot be sequenced using any existing sequencing technology.
Genomes comprise two components: the euchromatin, which is the component in which the genes are found, and a highly repetitive and tightly condensed component called heterochromatin. In the human genome sequence, there are still some (very) small unsequenced gaps remaining in the euchromatic portion of the genome, but these amount to less than 1 percent of the human genome. The other, larger missing component is in the heterochromatic sequence. Heterochromatin makes up about 20 percent of the human genome and, thanks to its highly repetitive nature, is the most difficult portion of the human genome—or any genome—to sequence. Heterochromatin probably plays important roles in regulating gene expression, in directing the segregation of chromosomes during cell division, and in determining where the different chromosomes live within the nucleus. Because it is so difficult to sequence using existing technologies, however, we know very little about it compared with what we know about the euchromatic portion of genomes.
Heterochromatin will be no simpler to sequence from an ancient sample than it is from a living human. In fact, sequencing heterochromatin from degraded samples is likely to be extra challenging compared with sequencing from samples of living organisms, thanks to the fragmented nature of ancient DNA. It remains to be known whether this is an important roadblock to de-extinction.
Because we cannot know the complete genome sequence of an extinct species, synthesizing a complete genome from scratch would not be an option for de-extinction even if it were to become possible to re-create synthetic eukaryotic life. I strongly believe, however, that synthetic biology is the way to bring extinct species and traits back to life. While we cannot synthesize an entire genome, we can synthesize fragments of DNA. What if we could use these fragments of DNA to engineer extinct species back to life?
CUT AND PASTE A MAMMOTH
George Church is a professor of genetics at Harvard Medical School and is the leading partner in another mammoth de-extinction project, one that is markedly different from those that rely on finding intact cells in the Siberian permafrost. George is using genome engineering to resurrect a mammoth, which is, as I said, one of the two presently feasible methods for resurrecting extinct traits.
I first met George at the Wyss Institute in Cambridge, Massachusetts, in 2012. He was hosting a mini-conference that was organized by Ryan Phelan and Stewart Brand of the Long Now Foundation as part of their new nonprofit undertaking, Revive & Restore. The conference was notionally about a project to bring back the passenger pigeon, and as the scientist with the largest collection of passenger pigeon bits in her ancient DNA lab, I was invited to attend. Also attending were conservation biologists, including Noel Snyder from the US Fish and Wildlife Service, who has spent many years of his life working on the project to save the California condor, and scholars of bioethics, like Hank Greely, a Stanford law professor who specializes in the social and ethical implications of biotechnology. The conversation was intense and at times angry, but it was tremendously useful: it was at this mini-conference that I realized how de-extinction was going to happen.
George Church is one of my favorite scientists. There are few people in the world who successfully straddle the gulf that separates genius and madness, and he is one of them—probably because his genius far outweighs his madness. George Church is one of the most inventive minds in genomics, a fact that is most apparent in the excessively long lists of biotech partnerships that appear at the ends of his papers and presentations.
At the meeting in 2012, George presented his plan for bringing a mammoth back to life as a model for what could and should be done for the passenger pigeon. His plan involved using new (and awesome) technology to change the elephant genome, bit by bit, into a mammoth genome. His plan can most simply be summarized as a cut-and-paste job. I’ll describe it in much more technical detail later, but for now, here are the basics.
First, we collect a few (or many) well-preserved mammoth remains, extract DNA, and assemble a genome. We then compare that genome to the genome sequence of an Asian elephant, and identify the parts of the genome where the mammoth sequence is different from the elephant sequence in some important way. This provides us with our plan: we will edit the elephant genome so that it looks like the mammoth genome in those specific places.
Second, we synthesize strands of mammoth DNA that match the genomic regions that we want to change. We do this by stringing together As, Cs, Gs, and Ts and following the template provided by our assembled partial mammoth genome. This provides us with strands of DNA that we will later paste into the elephant genome. These synthesized fragments could be very short (only a few base-pairs long) or somewhat longer (several hundred or possibly several thousand base-pairs long), but they are much shorter than the length of a chromosome and are certainly within reach of what is feasible in the present day.
Third, we engineer a tool—let’s call it “molecular scissors”—whose job it will be to find and bind to precisely the sequence within the elephant genome that we want to change. There are several such tools, all of which I will describe later.
Fourth, we deliver the synthesized strands of mammoth DNA and the molecular scissors into the nucleus of an elephant cell. The molecular scissors locate the precise spot in the elephant genome where the edit is to be made, bind to it, and cut the strand of DNA in half. Because having broken DNA is bad for the cell, cellular machinery has evolved that will fix exactly this type of DNA damage. This cellular machinery kicks into action and fixes the broken strand by pasting the mammoth version of the sequence in place of the elephant version.
Fifth, we measure the success of the cut-and-paste by designing an experiment that allows us to learn whether the cells are now expressing the mammoth gene and not the elephant gene. This step allows us to identify those cells that have been edited and then to measure how, if at all, the edits change the phenotype of the cell.
Finally, those cells in which all the cut-and-paste jobs have been successful are used in nuclear transfer to create living organisms with selectively engineered genomes.
I think I can safely speak for the others attending the meeting in saying that we were, as a group, rather taken aback by how real and achievable George’s pre
sentation made de-extinction feel. His approach seemed simple, even elegant. Could it be true that living, breathing mammoths really were within reach within the time frame proposed by Professor Iritani (although not by the same means)?
At the time, George had not yet started manipulating elephant DNA. The mammoth genome was still in the very early stages of assembly, and, as such, it was not entirely clear what parts of the elephant genome should be targeted for editing. We were also still in the process of sequencing the passenger pigeon and its closest living relative, the band-tailed pigeon, so we, too, had little idea about what we might actually change in the band-tailed pigeon to make a passenger pigeon. This presentation, however, clarified what our goal should be. And, more importantly, this goal appeared achievable. We did not need to sequence the complete genome. We simply had to figure out, somehow, which parts of the genome were important and sequence those.
MOLECULAR SCISSORS AND ENZYMATIC GLUE
While genome editing as described by George Church sounds pretty straightforward, the process—unsurprisingly—faces significant technical challenges. To be successful, genome editing has to be specific. Nobody wants molecular scissors to go around wantonly chopping up a genome and randomly inserting DNA. This would not only not have the desired effect on the phenotype of the cell (or the resulting animal), but nonspecific chopping up of DNA is actually toxic to the cell. It causes genomic instability and often cancer.
Key to the success of genome editing has been the discovery and development of different types of programmable molecular scissors. Programmability allows specificity, which means we can make the cuts we want to make where we want to make them, and we can avoid making cuts that kill the cell.
For the last decade or so, two types of programmable molecular scissors dominated the field (figure 10): zinc finger nucleases (ZFNs), and transcription activator-like effector nucleases (TALENs). ZFNs and TALENs are similar in that they are both hybrid molecules made up of two distinct parts. The first part is a protein that recognizes and binds to the part of the genome that is to be edited, which is sometimes called the “arm.” This is the programmable part—each zinc finger recognizes a specific sequence of three nucleotides, and each transcription activator-like effector (TALE) recognizes a single nucleotide. Chains of zinc fingers or TALEs are strung together synthetically so that each chain recognizes a specific sequence of DNA. The second component of the hybrid molecule is the nuclease. The nuclease is the scissors that actually make the cut. The nuclease is attached to one end of the chain of zinc fingers or TALEs. Two hybrid molecules are synthesized for each edit that is to be made: one that finds and binds to the DNA sequence that lies upstream of the target site and another that binds to the DNA downstream of the target site. When both molecules have located exactly the right spot in the genome and have bound to it, the nuclease makes a cut.
Figure 10. Zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs). Each finger in a ZFN recognizes a specific sequence of three nucleotides, whereas each transcription activator-like effector (TALE) recognizes a single nucleotide. The arms are created by linking a nuclease and specific sequence-recognizing fingers or TALEs together so that their sequence matches the genomic sequence to which they are intended to bind.
Making the correct cut is only the first half of the cut-and-paste process. The second half of the cut-and-paste process involves tricking the cell to replace the elephant version of the sequence with the mammoth version of the sequence as it repairs the newly broken strand of DNA.
Normally, cutting both strands of DNA would be lethal to the cell. If only one strand were cut, the cell’s repair machinery could fill in whatever sequence was lost using the other strand as a template. If both strands are cut, it is less obvious how the cell would know how to replace any missing sequence.
Two different cellular-repair mechanisms have evolved to solve this problem. The first is called homologous recombination. Because there are two homologous copies of each chromosome in the cell (one from mom and one from dad), one of these can be used as a template to fix errors in the other. In homologous recombination, the two homologous chromosomes line up next to each other and recombine, allowing the cell’s repair machinery to use the intact chromosome’s sequence as a template to fix the break. The cut-and-paste process aims to harness this repair mechanism but to trick the cell into using the synthesized strand of DNA (here, the mammoth DNA that was delivered into the cell along with the molecular scissors), instead of using the homologous chromosome, as the template for repair.
The other mechanism for repairing double-strand breaks is non-homologous end joining. This mechanism does not require a homologous sequence as a template for fixing the strand of DNA but instead just glues the broken ends back together. This is not the pathway that we want the cell to follow if we want to change the DNA sequence, but it is a pathway that is often used by the cell. Thus, one challenge that remains is to develop a way to control which of these pathways the cell uses to repair the DNA. At the moment, only a fraction of cells that are edited will end up with the new version of the gene pasted in the right place after the break is repaired.
ZFNs and TALENs have already proved to be tremendously powerful molecular tools. ZFNs have been used to fix mutations that are known to cause genetic disease in humans by directly editing the genome sequence in patient-specific stem cells. These modified stem cells can then be transplanted into the patient to cure the disease. ZFNs are even being used to develop a cure for HIV/AIDS by editing the CCR5 gene, which codes for the protein that HIV uses to enter T cells, into a version of CCR5 that the virus cannot use. Genome editing has also been used to insert herbicide-resistance genes into corn and tobacco and to alter cow genomes so that they produce the human version of various blood and milk proteins.
Genome-editing applications of ZFNs and TALENs are limited mainly by the need for specific targeting within the genome, which it turns out is pretty difficult to control. Longer probes made by linking more zinc fingers or TALEs together provide increased sequence specificity, but longer proteins are harder to deliver into the cell. Also, making the probes is a painstakingly difficult process that often requires months or years of trial and error. These are all problems that are encountered when working with organisms with long histories of experimental manipulation in molecular biology labs. If these methods are to be applied to de-extinction, the experiments will include species whose genome sequences are not known and that have never been used in molecular biology research, compounding the difficulty of making the experiments work. Certainly, these genome-editing tools have potential when it comes to de-extinction. Digging deeply into exactly how they work, however, provides a somber reality check.
A CRISPR VIEW OF DE-EXTINCTION
Around the same time as our Harvard meeting, a new molecular tool appeared in the genome-editing toolbox. This new tool, called CRISPR/Cas9, was first discovered for its role in providing immunity to bacteria by learning a pathogen’s DNA sequence and later targeting and destroying that sequence. Harnessing this system for genome editing provides two key advantages over ZFNs and TALENs. First, the programming is much faster—there is no longer a need to link fingers or TALEs together by trial and error. Second, much longer sequences can be used, which provides a tremendous increase in specificity. The relative ease and simplicity with which genome editing can be achieved using this system hints that another revolution in biology—similar to the one that came about when PCR was first developed—may be just around the corner.
Here is how it works. When a pathogen invades a bacterial or archaeal cell, the pathogen genome is recognized and chopped into small pieces. Some of these pieces become incorporated as “spacers” into a molecule called a CRISPR, or clustered regularly interspaced short palindromic repeats. In this manner, these bits of pathogen are integrated into the bacterial genome and stored for future use. To defend itself against invading pathogens, the cell transcribes the CRISPR a
nd chops it up at the repeats, releasing the spacers, which, remember, are sequences of pathogen DNA. The transcribed spacers are taken up by Cas9 proteins, which then scan the cell for DNA that matches the sequence of the spacer as a means to find and destroy invading pathogens.
To translate the CRISPR/Cas9 system into one that is useful for genome editing, imagine that, rather than grab bits of pathogen DNA and use these sequences as probes to search for potentially invading pathogens, the Cas9 molecules bind to a sequence that we design and use this to search for the part of the genome we wish to edit (figure 11). This becomes a highly efficient, highly precise way to locate specific parts of the genome. We design and synthesize CRISPR-RNAs, which are analogous to linked-together zinc fingers or TALEs, to find a precise part of the genome. When the CRISPR-RNAs find that part of the genome, Cas9, which is analogous to the molecular scissors in ZFNs and TALENs, makes the cut. After this, the standard DNA repair processes take place and—we hope—our edits are incorporated into the genome sequence.
In addition to gains in both speed and specificity, the CRISPR/Cas9 system also provides an increase in efficiency when we want to make multiple changes at the same time. Cas9 and the synthesized CRISPR-RNAs are not physically linked, which means that many different CRISPR-RNAs can be delivered into the cell at once. Each of these will be captured by Cas9 and used to find (and cut) different parts of the genome.
George Church’s team at the Wyss Institute is among the research groups that are leading the development of the CRISPR/Cas9 system for genome engineering. Most people in his lab are thinking about applications of CRISPR in personalized medicine, or about refining the technology so that it is possible both to insert longer fragments of DNA and to perform multiple edits of different parts of the genome at the same time. But tucked away, in a dark corner of his lab (this is how it is arranged in my imagination, anyway), is a small team of postdocs with a mammoth-sized goal: the self-named mammoth revivalists. Every month, labs involved with Revive & Restore connect via teleconference to catch up on progress in our active de-extinction projects. The mammoth revivalists consistently put the rest of us to shame. We are still assembling the passenger pigeon genome and trying to figure out what we might want to change. They’ve decided not to wait for the mammoth genome to be finished before proceeding at full bore. Beginning with a few mutations that we’re aware of—the differences between mammoth and elephant hemoglobin—and a few good guesses, they are cutting and pasting their way to a mammoth.
How to Clone a Mammoth Page 13