by Pbo, Svante
However, by this point it was impossible to conceive of a way to write just one paper, given the use of two completely different methods, the tremendous difference in the amounts of data generated, and the disagreement with Eddy about the viability of the bacterial-library approach. So we decided to write two papers. One was to be written by Eddy with us as co-authors, the other by us and Michael Egholm, Jonathan Rothberg, and the others at 454. Eddy’s paper stated: “The low coverage in library NE1 is more likely due to the quality of this particular library rather than being a general feature of ancient DNA,” suggesting that if one assembled more libraries, better results would be achieved. Given that the earlier cave-bear libraries had been just as inefficient, I disagreed with the assessment, but we stayed civil. Eddy submitted the paper in June to Science and it was accepted in August. Because we had much more data to analyze for the 454 paper, we couldn’t submit our paper until July to Nature. Eddy graciously arranged with Science to delay publication of the cloning paper until the paper with 454 Life Sciences had been reviewed and accepted in Nature so that the two papers could appear in the same week.
While this was going on, we began to prepare for what we hoped would be the production of large amounts of Neanderthal sequences. The first thing I did was to arrange production of 454 sequencing libraries in our clean room in Leipzig so that the precious, contamination-prone DNA extracts would not have to leave our laboratory. I also used a chunk of the new money to order a 454 sequencing machine so that we could test the libraries. Then Michael Egholm and I worked out a plan. We would make DNA extracts from bones, produce 454 sequencing libraries in our clean room, and use our new 454 sequencing machine to test the libraries. When we identified promising libraries, we would send them to Branford for production sequencing. The sequencing would be done in stages, and we would pay in installments once a certain amount of Neanderthal nucleotides had been sequenced. The latter was my suggestion, and I was amazed that 454 agreed to it, given that our earlier work together had shown that the best library so far had contained only 4 percent Neanderthal DNA and 96 percent assorted unwanted DNA of bacterial, fungal, and unknown origin. We did not yet know what percentage of Neanderthal DNA would be in the libraries we would produce. If it turned out to be 1 percent instead of 4 percent, then 454 would have to sequence four times as much to get its money, since the contract stipulated the number of Neanderthal nucleotides sequenced, not the total number of total nucleotides (which would include all the bacterial ones). Neither the scientists at 454 nor their attorneys who looked at the contract before it was signed appeared to take any notice of this. In a sense, it didn’t matter, since there was a clause that allowed either party to get out of the collaboration at any time. We were obviously not going to be able to force 454 to sequence forever against their will. But it still seemed a much better contract than one that stipulated that the company would sequence a certain amount of raw nucleotides for us, irrespective of whether these were microbial or Neanderthal in origin.
I felt very good about the collaboration with 454. We complemented each other’s strengths excellently, and the people at the company were fun and easy to talk to. However, one difference between us was that 454 was under great pressure to establish itself in an emerging market for high-throughput sequencing technologies that was clearly going to become very competitive. Already, two other big companies had announced their intention to start selling high-throughput sequencing machines. 454 therefore wanted positive publicity about their involvement in the Neanderthal project, and they wanted this publicity not in two or three years, when the Neanderthal genome would presumably be sequenced and published, but as soon as possible. Just as Michael Egholm took our concerns and priorities into account, I wanted to take their priorities seriously. So when the contract was signed with 454, we allowed them to arrange a press conference in our institute in Leipzig on July 20, 2006, shortly after we had submitted our joint paper to Nature. Michael and another senior executive from 454 flew in for the event. We also invited Ralf Schmitz, the curator of the Neanderthal type specimen who had given us samples from the Bonn museum in 1997. He brought along a copy of the Neanderthal bone from which we had determined the first Neanderthal mtDNA sequences. We wrote a press release that pointed out that we were putting together the methods for ancient DNA analysis that our group had developed over many years of painstaking work with 454 Life Sciences’ novel high-throughput sequencing technology to analyze the Neanderthal genome. We also mentioned that, by coincidence, we announced this almost exactly on the day 150 years after the first Neanderthal fossil was discovered in Neander Valley.
The press conference was an electrifying event. The room was full of journalists, and media from across the globe followed it via the Internet. We declared that we would determine about 3 billion Neanderthal nucleotides within two years. It was wonderful to contemplate that what I had started secretly in the lab in Uppsala more than twenty years earlier, afraid that my PhD supervisor would find out what I was doing, had developed into this. It was a heady time.
It was also a time of great scientific and emotional ups and downs. About a month after the press conference came a definitive down. The two papers led by Eddy Rubin’s and our group were not yet out, but we had already shared our 454 Neanderthal data with Jonathan Pritchard, a young and brilliant population geneticist at the University of Chicago who had helped Eddy analyze his smaller data set of cloned Neanderthal DNA fragments. We received an e-mail from two postdocs in Pritchard’s group, Graham Coop and Sridhar Kudaravalli. They were worried about patterns they saw in the 454 data: in particular, there were higher numbers of differences from the human reference genome in the shorter DNA fragments than in the longer DNA fragments. Ed Green in our group quickly confirmed that they were right. This was worrying. It could mean that some of the longer fragments were not from the Neanderthal genome but represented modern human contamination. I e-mailed Eddy, telling him that we saw some worrying patterns in the 454 test data. We agreed to send our data to Eddy’s group in exchange for their data. After the exchange of data, Jim Noonan in Eddy’s group quickly e-mailed back and said that he saw what we and the Chicago postdocs had already seen in the 454 data.
It seemed that we might have to rewrite or withdraw our Nature paper, which was already in press, and I e-mailed Eddy, saying that we would try to figure out what was going on as fast as we could in order not to hold up his paper. Back when I was a postdoc in Allan Wilson’s lab, we had once withdrawn a paper that Nature had already accepted because we had found that we had made a mistake in the analysis that changed the main conclusions we presented. I worried that we would have to do this again.
There was now frantic activity in our group. It was not unreasonable to assume that the patterns Jonathan’s group saw were due to some level of contamination, but it was not straightforward to come up with an estimate of how much contamination there might be. It would have been an error to simply assume contamination was the problem, however. We were acutely aware that we did not understand many aspects of how the short, damaged ancient DNA sequences behaved in comparison with the human reference genome. Perhaps other factors than contamination were at play? Unfortunately, we needed to act fast as our paper was already in press and Eddy was eager to publish his paper.
Ed had noticed that the shorter Neanderthal fragments in our 454 data contained more G and C nucleotides than the long ones. G and C nucleotides tend to mutate more often than A and T nucleotides, so this could lead to more differences between present-day humans and Neanderthals in the short (and GC-rich) sequences than in the long (and AT-rich) sequences. To test this, Ed matched up short and long Neanderthal fragments to the corresponding sequences in the human reference genome and compared those sequences in the reference human genome to those from other present-day humans. Although those comparisons did not include any Neanderthal sequences at all, they nonetheless showed that the human sequences corresponding to the shorter Neanderthal sequences had m
ore differences from other human sequences than the longer ones. This observation suggested that the GC-rich sequences simply mutated faster, so maybe it would account for the higher number of differences seen in the shorter sequences. Before we could be certain, however, other factors also needed to be considered, especially the way in which we mapped Neanderthal sequences to the human reference genome sequence. Ed noticed that longer fragments of Neanderthal DNA had a better chance of being matched in the correct position in the human genome than shorter fragments, simply because they contained more sequence information. Therefore, a higher percentage of the short fragments might actually be bacterial DNA fragments that just happened to be similar to some part of the human reference genome. This, then, also might contribute to the observation that the shorter fragments contained more differences from the human reference genome. Such a phenomenon might have been overlooked in other ancient data sets—for example, the mammoth data, where fragments were on average longer. But I felt very uneasy. It seemed that every day we uncovered new things about how short and long DNA fragments differed in terms of how they behaved in our analyses. Obviously, we did not understand everything that was going on. What’s more, we still hadn’t excluded the possibility that our samples were contaminated by modern human DNA.
We had, of course, considered the possibility of contamination from the outset. In the extracts we sent to Eddy and to 454, we had assayed the level of contamination based on mtDNA and found it to be low. We knew that contamination could have entered the extracts once they had left our laboratory; we had even put a caveat about this in our Nature manuscript. I felt strongly that the only solid assay for contamination we had was the one based on assessing the observed mtDNA fragments, since the mtDNA was the only part of the genome where we knew about differences between Neanderthals and modern humans. Everything else was influenced by imponderables, such as differences in GC content, differences in mismapped bacterial DNA fragments, and other unknown factors. So I argued that we should look again at the mitochondrial DNA in the sequences that had been determined by 454.
In 2004, we had sequenced a part of the mtDNA from the very same Neanderthal bone, Vi-80, from which we had prepared test extracts for 454 and Eddy’s group. I suggested that we should look among the sequences we had gotten from 454. Surely some of those must overlap nucleotide positions that differed between this particular Neanderthal individual and present-day humans. This would tell us which fragments were unambiguously of Neanderthal origin and which were of modern human origin and would enable us to estimate directly the level of contamination in the actual final 454 data set. Frustratingly, Ed found that we did not have enough data in hand to do this. The sequences done by 454 contained only forty-one mtDNA fragments and none of them came from the part of the mtDNA genome that we had determined earlier from this or other Neanderthals. We checked the Berkeley data, but they were so scant that not even a single mtDNA fragment had been observed.
Happily, there was a solution: we had so much library left that we could simply sequence more DNA fragments. This should then yield fragments that could tell us whether we had contamination in the library or not. I contacted 454 and convinced the people there to quickly do more sequencing. They did six more runs in record speed, and as soon as the data were transferred to our server, Ed found six fragments that overlapped positions in the variable part of the mtDNA we had sequenced in 2004.
All six fragments matched the Neanderthal mtDNA and differed from present-day human mtDNA! These were direct data suggesting that we had very little contamination in our sequences. Interestingly, these molecules, although clearly ancient, were not particularly short; four of them were 80 or more nucleotides long. This suggested that truly ancient DNA fragments were present also among the longer DNA fragments. Thus it was likely that the differences seen between short and long molecules were due to factors other than contamination. Ed was so elated that he ended the e-mail to the group describing these results with “I could kiss every one of you.”
We decided to go ahead with the Nature paper. Susan Ptak, a population geneticist in our group, sent a long technical e-mail to Eddy and Jim Noonan explaining why we felt that comparisons between long and short sequences were influenced by too many factors both known and unknown to represent strong evidence of contamination and explained why we trusted the direct mtDNA evidence more. She wrote: “Although there is indirect evidence which suggests some level of contamination, we now have a direct measure of the contamination rate in the final data set, which still suggests it is low.” We received no reply to this e-mail. Given the rather tense relationship that had developed between our groups, we did not find this too surprising.
This was a tremendously stressful incident. Ironically, as it turned out, both Eddy and we were right. The future would show that the data generated at 454 did contain contamination, but also that the indirect ways of detecting contamination via comparisons of long and short fragments were largely inadequate.
The two papers were published in Nature and Science on the 16th and 17th of November.{48} There was the predictable excitement in the press, which I had by now gotten used to. In fact, I was much more preoccupied than excited. We had promised the world that we would sequence 3 billion base pairs of the Neanderthal genome within two years. Our paper ended with an estimate of what this would require—namely, about twenty grams of bone and six thousand runs on the 454 sequencing platform. We said that this was a daunting task, but added that technical improvements that would make the retrieval of DNA sequences on the order of ten times more efficient could “easily be envisioned.” The improvements we had in mind involved losing less material when making libraries for sequencing and taking advantage of secret future improvements to the 454 machines that Michael had revealed to us.
Things were looking up, but a major challenge still remained: finding good Neanderthal bones. The truth was that we did not have anywhere near twenty grams of Neanderthal bone of the quality of Vi-80, the bone we used in the test runs for the two papers. In fact, the piece we had left from Vi-80 weighed less than half a gram. I optimistically told myself that since one of the first Vindija bones we tried contained almost 4 percent Neanderthal DNA, surely we would find others that were equally good. Perhaps we would even find some that were better. I had to turn my full attention to this problem as soon as possible. First, however, I had to undertake a more unpleasant task: ending the collaboration with Eddy Rubin.
Terminating a scientific collaboration is often difficult, and it is even more so when a collaborator has become a personal friend. I had stayed with Eddy’s family in Berkeley; we had biked up the hills to his lab together; we had gone together to the theater in New York during Cold Spring Harbor meetings. I had always enjoyed his company. So I long pondered my e-mail to Eddy and wrote several drafts of it. I explained how I differed with him on bacterial cloning’s usefulness, and said that I felt that our communication, particularly on this point, had not been productive. I also noted that it now seemed that his group was trying to do the same things our group was trying to do, rather than working in a complementary way. For example, in our phone conferences, they had suggested that we send them our DNA extracts and the PTB reagent we had synthesized so that they could treat our extracts with our PTB. Neither I nor my group had been thrilled by this notion. I hoped I had expressed my reasons for not working together in a way that wasn’t hurtful or insulting, but it was still with some trepidation that I sent the e-mail. Eddy answered that he saw my points but that he continued to believe in the future potential for improvements and utility of bacterial libraries. I was relieved that he had taken my letter graciously, but we were now, clearly, competitors rather than collaborators.
The competition became apparent almost as soon as I turned my attention to the procurement of Neanderthal bones. Eddy was trying to obtain them too, I discovered, and from many of the same people we had worked with for years. In fact, I found out that already back in July, Wired magazine
had published an article about Eddy’s Neanderthal efforts. The Wired piece ended with a quote from Eddy: “I need to get more bone. I’ll go to Russia with a pillowcase and an envelope full of euros and meet with guys who have big shoulder pads. Whatever it takes.”
Chapter 12
Hard Bones
_________________
Even before our Nature paper came out, Johannes Krause had begun preparing extracts from Neanderthal bones we had collected from Croatia and elsewhere in Europe over the years, hoping to find a bone that might contain as much or more Neanderthal DNA as Vi-80. Johannes was tall and blond, not so dissimilar to the German stereotype. He was also very intelligent. He was born and had grown up in Leinefelde, the very same town where in 1803 Johann Carl Fuhlrott had been born. Fuhlrott was the naturalist who in 1857, two years before Darwin published The Origin of Species, had suggested that the bones found in Neanderthal derived from a prehistoric form of humans. This was the first time anyone suggested that other forms of humans had existed before current humans and Fuhlrott was widely ridiculed for his idea, but he would be proven correct when additional Neanderthals were unearthed. Fuhlrott became a professor at the University of Tübingen, where today, appropriately, Johannes is a professor.