When in Doubt, Outsource
In 1989 I met Craig Venter, who had invited me to the first genome sequencing meeting at the Wolf Trap conference center in Vienna, Virginia, just outside of Washington, DC. By then Craig had a brilliantly simple strategy for sequencing DNA. He was stuck at NIH, where researchers had loads of money but very little space, and great limitations on hiring people. So he brought human cDNA molecules (complementary DNAs useful for a variety of molecular-biological lab tasks) from plasmid libraries from one company (Clontech), and then sent them to another company to grow the plasmids and purify the DNA, then pay ABI (Applied Biosystems) to set up and maintain instruments that would sequence the pure plasmids. Finally he would send the data to Genbank to store in professional databases run by David Lipman and Jim Ostell, leaders in the development of cutting-edge search tools.
Thus Craig’s little NIH team accomplished what other labs could not because they were struggling to develop ways to do all of these steps on their own and to minimize costs. Perhaps they felt that their peers would not fund or renew their grants if they outsourced nearly every aspect of their research, but Craig didn’t buy into that penny-wise, pound-foolish dichotomy. The success of his approach surprised its numerous critics. When the project transitioned to TIGR-HGS, and NIH became concerned, I claimed that NIH shouldn’t worry, since completing 95 percent of the cDNA molecules was probably harder than 95 percent of the genome since the two molecules varied wildly in abundance and, unlike the genome, cDNAs had far less evident measures of completeness. Craig responded “Thanks—I think.” Around this time I was working for CRI as a consultant. I hatched a youthful prank to slip synthetic codes into the TIGR plasmid prep pipeline that might be decoded much later. Something similar seemed to plague a later effort to sequence DNA from pristine Atlantic Ocean samples, which instead found what looked more like Atlantic City, with genome messages in bottles resembling human sewage.
Follow Your Dream
A fanatic is a person who, upon losing sight of his goal, redoubles his efforts. This is the story of EngeneOS, Codon Devices, and Gen9.The dream was to do for biology what Intel had done for electronics. In 2001, Joseph Jacobson, Eric Lander, Daniel Wang, Stephen Benkovic, and I formed the founding scientific advisory board of EngeneOS (Engineered Genetic Operating System).
The company was one of the first to be financed by the Newcogen Group (later Flagship Ventures), a venture capital firm that had been the brainchild of Noubar Afeyan. Noubar had made some big money on an invention in the field of separation science, and in his time has founded or cofounded more than twenty life sciences and technology start-ups. One of them was Celera Genomics, where Craig Venter did some of his first work on sequencing the human genome.
The EngeneOS website claimed that its “technology platform starts with the ‘source code’ of Nature’s operating system, embodied in the genomic sequences of various organisms. The company is combining this information with modern molecular biology techniques, engineering and design principles to develop Engineered Genomic Operating Systems. These systems will consist of component device modules supported by modeling and design tools.”
In other words, EngeneOS expected to build a library of proprietary modular components, including engineered cells and proteins, as well as hybrid devices composed of biological and nonbiological materials. These modular elements would contribute to the design and fabrication of programmable biomolecular machines with novel form and function. It was an ambitious program: start with nature’s operating system, reprogram it, and collect your output in the form of fabulous new engineered organisms.
So what happened?
Nothing. EngeneOS spun off various parts of itself before it ever really got started. The final spin-off was in 2004, when I, together with Drew Endy, Keasling, and others, cofounded Codon Devices, of Cambridge, Massachusetts, to develop tools for large-scale gene synthesis, CAD, and synthetic biology applications. Codon was very much a redo of EngeneOS. Our vision was not to compete with existing custom synthetic DNA firms, but to focus on the new opportunities in complex biological systems and miniaturization of processes, analogous to VLSI (very large-scale integration) in computer chips. We started with Samir Kaul as acting CEO. He was an inspiring leader in part because he had experience in the science of genomics and had worked with Craig Venter and his team on the first genome sequence of a plant (the mustard relative, Arabidopsis).We cornered the market on intellectual property related to DNA error correction as well as the thought leaders in the new field, the BioFab 9 mentioned earlier: David Baker (the undisputed king of protein prediction and design), myself, Jim Collins (genetic switches), Drew Endy (T7 refactoring, iGEM, the biobrick parts registry), Joe Jacobson, Jay Keasling (metabolic engineer), Paul Modrich (mismatch repair), Cristina Smolke (riboregulators), and Ron Weiss (genetic circuits for spatiotemporal patterning).
Unfortunately, despite all this intellectual horsepower, Codon Devices too fizzled and restructured. The CEO replacing Samir really liked the idea of a short-term payoff, as did several board members, and so they emphasized the idea of competing with existing companies that were doing DNA sequencing and synthesis rather than creating a market for something that didn’t exist.
Later, Codon undercut the prices of other companies, plus they had a great sales force. But the sales force was so good at bringing in orders and so good at undercutting everybody else’s prices that it started operating at a loss per customer. But even though the losses were very small per customer, they started adding up.
Codon Devices was disassembled in 2009. Nevertheless, we rescued parts of it in the form of yet a third start-up try at the original concept, this time called Gen9bio Inc. The idea was to start with DNA microarray chips on which gene-size (500 to 1,000 base pair) strands of DNA could be assembled. This sort of synthetic gene-building capacity would be used to produce both a large set of enzymes that are useful in making pharmaceuticals, and a set of constructs for optimizing overproduction of proteins in industrial-scale mammalian culture.
The company is so new that it just recently established its own web page. But it does have a few million dollars in initial financing and, of course, high hopes.
Follow Your Dream but Be Nimble
In 2001 Genomatica began as a metabolic engineering software company. Bernhard Palsson, Christophe Schilling, and others at UCSD had pursued a particularly useful brand of systems biology that combined tools from economic optimization with the detailed pathways of metabolism. The cell is like an industry, having a few choices of input materials, various ways to convert them into intermediate chemicals and, finally, end products. Given constraints on various transport and manufacturing speeds, one can adjust the entire network to maximize production of one particular product. In 2006 Genomatica morphed into a full-fledged synthetic biology company including wet lab experiments and a scale-up strategy.
The company aims to use its proprietary engineered E. coli bacteria to produce sustainable, green chemicals, at lower cost and with a smaller footprint than the products of conventional chemical companies. The initial chemical, Bio-BDO (1,4-butanediol), is used to make spandex, automotive plastics, running shoes, and insulation, among other things, and has a $4 billion market worldwide. The company has produced Bio-BDO at the pilot scale, and in 2011 was ramping up to demonstration-scale production. Genomatica has at least $84 million in financing and by all appearances looks to be one of the emerging leaders in the sustainable chemicals industry.
And now for a genuine and detailed synthetic genomics success story. In 2004 Jay Keasling, from the University of California–Berkeley, and one of the BioFab 9, received a grant from the Bill and Melinda Gates Foundation to make E. coli bacteria (and later baker’s yeast) produce a precursor drug for the antimalaria drug artemisinin. Starting at zero, a few foreign genes not found in yeast had to be found and introduced, resulting in nonzero but still minuscule amounts of a useful intermediate product. The gene was resynthesized to achie
ve better codon usage, and indeed this improved yield 142-fold. Next, the yeast mevalonate pathway was optimized, resulting in a 90-fold greater output. Debugging yielded another 50-fold increase. Then optimization of the methods of fermentation (not the genome) gave another 25-fold boost. Finally a very clever idea of tethering some key enzymes via a scaffold normally used for very different purposes gave a shockingly high additional 75-fold improvement. Multiplying these factors to get to a billion-fold improvement in yield says more about how low we started than about how far we have come. For the latter, the better measure is how close to the theoretical maximum we are.
A general rule of thumb for maximum yields (as seen for Dupont’s PDO process; see Prologue) is 100 grams per liter and 3 grams per liter per hour. Artemisinin precursors are typically made at 1 gram per liter, so theoretically there is some room for improvement here. In 2008 Jay’s company, Amyris Inc., granted a royalty-free license to this technology to pharmaceutical giant Sanofi-Aventis for the manufacture and commercialization of artemisinin-based drugs, with a goal of having them on the market by 2012.
In part due to Jay’s success in bringing this previously uneconomic drug to a cost point suitable for developing nations, in 2007 British Petroleum committed $500 million for synthetic biology research on biofuels at Berkeley. (Chapter 4 is devoted to the biofuel applications of genome engineering.)
In a Gold Rush, Sell Shovels
The million-fold plummeting costs of sequencing (Figure 7.1) could have meant the end of profit if customers hadn’t thought of something to do with the sequences. One might call it a “shovel-rush.” ABI had carefully groomed its monopoly for years and then a bunch of yahoos came in without a plan. Remarkably, the field responded by increasing demand by more than a million-fold, as if the auto industry started selling cars for $0.03 instead of $30,000 and people responded by ordering 2 million cars per household. And begging for more! For reading DNA, some of us had confidence (since 1977) that there would be a market of between 1 and 6 billion people with 6 billion base pairs each (over 1019 bp). The response to an analogous drop in costs of writing DNA is less well articulated—until now.
Let’s zoom back to Figure 7.1 for synthesis. The synthesis of gene libraries and genomes begins with a source of short bits of DNA. Har Gobind Khorana created the first RNA oligomers in the early 1960s to help crack the genetic code (depicted in Figure 3.2), winning a share of the Nobel Prize in 1968 for this. Then he led his team to synthesize the first gene, which happened to encode the molecule at the core of the code and at the core of ancient and current life (our old friend tRNA). By the mid 1980s, Marvin Caruthers and team made a better chemistry that BioSearch and ABI automated so that labs could make one to four oligos just by typing in the sequence. In 1996 Blanchard, Kaiser, and Hood and later Rosetta Inpharmatics and Agilent adapted ink-jet printers to print A, C, G, and T onto flat glass slides. Xiaolian Gao at Xeotron and another group affiliated with Nimblegen came up with ways to make custom oligo arrays on the fly using spatially patterned light. Both groups teamed up with Jingdong Tian in my group in 2004 to show that the DNA on those arrays didn’t have to stay on the arrays to be useful. Kosuri and coworkers published in 2010 a way to make subpools from the oligo arrays. And the freefall in cost of gene and genome synthesis suddenly seemed as inevitable as what had just happened for genome reading.
By 2012 the combined throughput of MycroArray, Combimatrix, LC-science, and Agilent could be around 300 billion base pairs per day, slightly behind the global genome sequencing capacity. Most of this is used for disposable arrays of oligos used for RNA quantitation or purification of genome subsets for sequencing—not for synthesis of genes or genomes. Clearly a market existed for sequencing 1019 genomes (a billion people with 6 billion base pairs each, plus repeat customers due to microbial, immune, and cancer genomics).
What might the markets be that will drive similar levels of consumption of DNA from chips? Here is our wish list (or bucket list, to go with the shovels):
Antibodies and fusion proteins
Binding proteins for DNA and RNA
Cell circuits: enhancer/splicing cis elements
DNA nano structures: smart drug delivery, nuclear magnetic resonance rods
Enzymes: every type and metagenomic
Foreign DNA: de novo or ancient reconstructions
Genomes: new codes, new amino acids, virus resistance
Homologous recombination: integrases
Isolation and safety chassis
Joined sensor-select: +/-allosteric regulators
Knowledge, media, data storage, steganography
Ligand engineering
Metagenomic access
Nanopore sequencing and sensors
Opto-electronics and scaffolding
etc. . . .
The point is that the applications of large-scale DNA writing are even more up for grabs than DNA reading. Predicting the future development of this field would be like trying to guess what applications of the personal computer would have been in the early 1970s: Electronic recipe books? Ping-pong? Balancing your checkbook?
Synthetic biology is mostly about developing and applying basic engineering principles—the practical matters that help transform something academic, ivory-towerish, pure, and sometimes self-indulgent or abstract into something that has an impact on society and possibly even transforms it. Systems biology moves us from massive numbers of observations to theories. But when you try to build something—even an academic something—you really acid-test those ideas—often finding out how little you need to understand or how much you do need to understand but don’t. When you try to build something for society, it’s even tougher since many things that work in your lab don’t work in other labs, much less in the hands of regular folks. And even if they do work, there is no guarantee that they will be financially successful.
The business of synthetic biology may now be transitioning to making a living by synthesizing genomes. What has been largely missing is an articulation of why we should engineer whole genomes rather than just the important parts. In answer, I have described a project to change the genetic translation code genome-wide for safety, to create new amino acids, and to engineer resistance to all viruses.
We are already in the business of making a safer microbial “chassis.” In 2011, DARPA put out a request for proposals for ways to “watermark” pathogens being actively studied in laboratories so that we could more easily trace accidental or intentional releases. This reflects our 2001 DARPA proposal to use DNA as a storage medium, and as we will see in Chapter 8, embedding English, encrypted messages, or even images in DNA has a history going back at least to 1984. The new challenge is to make these messages stable across time. Another DARPA challenge is to make the pathogens being studied able to survive only under specific laboratory conditions and not in the wild—without at the same time altering them so drastically that researchers can’t study their pathogenicity. All of these measures also apply to nonpathogenic, synthetic organisms, that although considered safe, would likely be more widely (and wildly) distributed (because of industrial utility) and hence more worthy of possessing tracking and safety features.
In Chapter 2 we examined ancient human texts and compared their longevity to the texts of life. In the tradition of encoding art in DNA discussed above, the world’s first so-called synthetic organism (Craig Venter’s M. mycoides) was accompanied by bits of human text embedded in code (the four letters of DNA). One bit of text read: “To live, to err, to fall, to triumph, to recreate life out of life.” This sentence, which is a quote from A Portrait of the Artist as a Young Man, by James Joyce, prompted the Joyce estate to send Venter a cease-and-desist letter. This beautifully captured the moment at so many levels. Was quoting the Joyce text “to err, to fall” (i.e., was it an error on Venter’s part?) or, to the contrary, was including the text in a historic hunk of DNA “to live, to triumph” (i.e., to glorify Joyce)?
In addition to the unfortunate Joyce q
uote, the genome of M. mycoides JCVIsyn 1.0 also incorporated a misquote of what Richard Feynman wrote on his last blackboard. As Venter’s team had it, “What I cannot build, I cannot understand.” They were quoting from a secondary source (a risky business), because what Feynman actually wrote was: “What I cannot create, I do not understand.” That too elicited a corrective note from the authorities, in this case Caltech, where Feynman taught, together with a picture of the blackboard in question. (The JCVI researchers further compounded their errors by expressing the Feynman word in using a code that allowed only uppercase letters, which netizens interpret as shouting. But FEYNMAN WAS NOT SHOUTING! Nor, for that matter, did he write his parting message in caps.) Finally, we note that “What we can create, we don’t necessarily understand.”
I noted that small blooper when I assessed the JCVI manuscript for Science prior to its publication in that journal. To lighten up the rest of my critique, I playfully submitted my review to Science encrypted entirely in DNA. I heard later that Clyde Hutchinson, an investigator at JCVI, was the one who figured it out. For the first few sentences, I used their code, but then I switched to a code that allows the encoding of anything digital, including lowercase letters, images, audio, and even web pages—and is easier to recall than sixty-four codons. In alphabetical order A = 00, C = 01, G = 10, T = 11. Even if the code is easy to recall, will the encoded messages endure, perdure, or neither?
Regenesis Page 19