by Kirk, Edwin;
The earliest such maps were made in the early 20th century, for fruit flies. By 1922, genes for 50 different characteristics had been mapped to the four fly chromosomes. These were all physical differences in the fly that a researcher could directly observe. Flies would be examined for multiple different characteristics and mated with flies that had also been carefully examined, and the resulting offspring would be examined in turn. It was exacting, difficult work, but taught us a lot of fundamental information about genetics, and gave us tools that were used all the way through the 20th century, and were essential to the success of the Human Genome Project.
For the fruit fly’s X chromosome, for example, there was an early map that looked like this:
y..........w.......................................................................v....m
In this map, y stood for yellow body, w for white eyes, v for vermilion eyes, and m for miniature wings. The map means that yellow body and white eyes are closely linked — they are more likely to be inherited together — whereas the miniature wings variant is more likely to be inherited along with vermilion eyes than with white eyes. This particular map was the work of Alfred Sturtevant, another nearly forgotten genius, and was put together in 1913, when Sturtevant was only 21. At the time, he was working under the supervision of the great geneticist Thomas Hunt Morgan. Sturtevant seems to have been a child prodigy: by age 21, he already had a long track record of studying inheritance. Morgan had been impressed by Sturtevant when, still in his teens, he wrote a paper about how horses inherit coat colours — based on observations made as a child, on his father’s farm! The paper was published in a scientific journal, Morgan offered Sturtevant a position in his lab — and the rest is history.
That’s some school project.
Sturtevant went on to have a long and distinguished career in science, pausing only to marry Phoebe Curtis Reed, a technician who also worked in the fly lab. They had three children, who must have grown up with quite unusual ideas about what constitutes a normal topic for dinner table conversation.
From the point of view of genetic mappers, most of the 20th century was a hard slog. From mapping physical characteristics that you could see, things moved on to biochemical and other laboratory markers — in yeast, and eventually in humans. It wasn’t until 1987, 17 years after Sturtevant died, that the first genetic map spanning the whole human genome was published — 407 variable bits of DNA spread across the 23 chromosomes. If the Human Genome Project was the moon landing of genetics, Sturtevant’s early maps were our Wright brothers’ flight.
So, by the late 1980s, we had that outline map, like those early explorers’ maps of the world. We knew the outlines of our 23 land masses, and there were definite signposts spread along them (those 407 genetic markers), but, apart from a few important ports — chunks of known DNA sequence centred around disease genes — there was precious little detail to be found on our maps.
Going from a 407-marker sketch to a richly detailed map, and then to a completed genome sequence, needed some special tools. One of the most important of those was Sanger sequencing.
You’ve surely heard of Marie Skłodowska Curie, who won two Nobel prizes, in physics and chemistry. There’s a good chance you’ve heard of Linus Pauling (Nobels for chemistry and peace). I have to confess I had never heard of John Bardeen until I started writing this chapter. This is embarrassing, since it seems we all owe him an enormous debt for the work that won him two physics Nobels. Bardeen was co-inventor of the transistor, and developer of the theory of superconductivity — your phone only works because of Bardeen’s discoveries. The same goes for the computer on which I am typing this.
But for our purposes, there can be no doubt that Fred Sanger was the greatest of the quartet of double Nobel winners. Sanger, an Englishman, was a chemist who worked out how to determine the sequence of amino acids that make up a protein, for which he received his first Nobel. Sanger was a Quaker who received official conscientious-objector status during World War II, and a good thing, too — it would have been an appalling loss to the world if he had died during the war.
The protein Sanger chose to study first was insulin — the sugar-regulating hormone that is lacking (or ineffective) in people with diabetes. Insulin had been successfully used to treat diabetes since the early 1920s, and was one of very few proteins available in pure form in the early 1950s. The story of how that came to be is a special part of science history in itself.
Media stories of medical breakthroughs tend to be evenly divided between breathless reports about small improvements that happened several years ago and research in animals that might never translate into something relevant to humans. My PhD supervisor, Richard Harvey, and I were once interviewed on national TV news about research we hadn’t even done yet. We’d been awarded a grant from the National Institutes of Health in the US, and our institute’s public-affairs department had somehow sold this as a big news story to one of the TV networks. Years later, when we had done the research and had the results published, we couldn’t even get a local newspaper to give us a mention.
But there are some genuine stories of miracle cures in the history of medicine. The introduction of penicillin is one: deadly, incurable infectious diseases suddenly became curable. But for a real wonder drug, nothing beats the story of insulin.7
[7 Okay, there is one that’s better. Anaesthesia beats insulin. And I am definitely not saying that because my wife is an anaesthetist.]
Diabetes comes in two main flavours — and I use the term advisedly. Diabetes mellitus, by far the most common type, gets its name from a Latin word meaning ‘sweetened with honey’ — because the urine of sufferers tastes sweet. If you were to taste the urine of someone with diabetes insipidus, by comparison, you’d find it insipid: flavourless. The urine sommelier at your favourite restaurant would certainly not recommend it.
In turn, diabetes mellitus is divided into two broad groups, based on how well treatment with insulin works. If you have a deficiency of insulin, because your pancreas has stopped making it, then what you need is a replacement, which you can get in the form of an injection. This is the type we’ll be focusing on here. On the other hand, if your body makes insulin fine, but no longer responds normally to its effects, you have non-insulin-dependent diabetes, quite a different problem. As you might expect, there are all sorts of subtypes beyond this broad division. One rare type that affects newborn babies will make a cameo appearance later in the book.
The main job of insulin in the body is to give your cells the go-ahead to take up and use glucose from the bloodstream. If there’s no insulin around, it’s as if your cells are sugar-blind — they just can’t tell that the glucose is there, and they can’t do anything with it. Since the sugar isn’t being used, it builds up in the blood and spills into the urine, making it sweet — and dragging water with it so that you pee too much and get dehydrated. Meantime, your body is starving in the midst of plenty, because your cells can’t use the glucose that’s there.
By the early 20th century, it was already known that if you removed a dog’s pancreas, the animal would develop diabetes and would die within a fortnight. Unfortunately, much the same was true for people — diabetes, which mainly came on during childhood, was a death sentence. Affected people might linger on for a few weeks or months, but eventually they would slip into a coma and die.
There are several heroes in this story, all Canadians. Frederick Banting was a surgeon who had an idea about how to get an extract from the pancreas of a dog that might be used to treat diabetes. People had tried to do this previously, but nobody could get it to work.
As well as making insulin, the pancreas makes digestive enzymes, and Banting thought that, when people mashed up pancreatic tissue to try to extract insulin, those enzymes were coming into contact with the insulin in the mash and digesting it, so that there was none left to be extracted. His idea was to tie off the ducts that carry the d
igestive juices from the pancreas to the gut, causing the cells in the pancreas that made those enzymes to wither away. He hoped that, when this happened, you could mash up the remaining tissue, which would mainly be the insulin-producing cells without the enzymes, making it possible to get a pure preparation of the stuff they were after. He went to a leading diabetes researcher at the University of Toronto, John Macleod, who took some persuading but eventually gave him the resources he needed to give the idea a go, including ten dogs and an assistant, a medical student called Charles Best. Interestingly, it’s still true that, in the medical hierarchy, medical students rank slightly above domestic animals.
The story goes that Best was the winner of the luckiest coin toss in medical history. There were originally two students who could have been assigned to the project. Best and the other student, his friend Clark Noble, tossed a coin to see who would work with Banting, and Best won. Initially, they were going to swap halfway through the summer, but by then Best was entrenched in the work (and technically skilled at doing it), and they agreed that he should stay on. A share of scientific glory, on the toss of a coin.
It turned out that Banting’s idea was right. The initial work in dogs was so encouraging that, by January 1922, it was possible to start trials in humans. Another figure enters the story here: James Bertram Collip was the one who worked out how to purify the pancreas extract so that it could be safely injected into people. He was called in after the first recipient suffered from a severe allergic reaction when given the imperfectly purified extract of dog pancreas. There are stories from this time of extraordinarily dramatic recoveries. In particular, it’s said that once they had a pure preparation, Banting’s team went from bed to bed in a room full of comatose, dying children, giving injections. By the time they reached the last one, the first had already awoken from his coma.
If this story is true, you certainly can’t tell from the first scientific report of treatment with insulin, published in the Canadian Medical Association Journal. This paper is breathtakingly, gloriously dull. It’s not until halfway through the second page (in a paper only a little over five pages long) that there is even a mention of treatment in humans. The conclusions are cautious and measured: in essence, ‘we can measure some differences in the blood of patients, and they seem to feel better’.
Even if Banting and co. were reluctant to blow their own trumpets, word of such a discovery was bound to get out, and the news raced round the world. The very next year, Banting and Macleod were awarded the Nobel prize for medicine or physiology. Banting shared his prize money with Best; Macleod shared his with Collip. We can only imagine what it must have been like for families of newly diagnosed diabetics, after the news was out, but before large-scale insulin production was possible. There must have been many people who died despite the existence of a treatment, and others who were saved in the nick of time.
Be that as it may, by the time Fred Sanger needed a purified protein to study, 30 years later, all he had to do was stroll down to the local pharmacy and buy a bottle. Sanger’s approach was to simplify the problem — instead of trying to read the sequence of the entire protein, he would smash it into shorter bits that were easier to handle. He developed chemical methods to work out the sequence of amino acids in those short sections, then pieced together those overlapping short sequences to work out the overall sequence of the protein. It was an idea that had a long scientific reach. As we shall see, Craig Venter’s company, Celera, would use essentially the same approach to sequence the human genome, nearly 50 years later, and it remains an important technique in genetics. In 2018, the koala genome was sequenced this way.
Sanger’s discovery was not just ‘this is the sequence of insulin’. Important though that would have been, its impact was trivial compared with the discovery that proteins have a set sequence, on which their structure and function depends. Many later discoveries, including our understanding of the way that DNA codes for proteins, would have been impossible without this basic understanding of what a protein actually is — a chain of amino acid molecules.
For his next trick (and next Nobel Prize), Sanger figured out a way to read the sequence of DNA. His method, published in 1977, is still known as Sanger sequencing, and was the foundation stone on which the Human Genome Project rested. At first, Sanger sequencing involved a certain amount of messing around with radioactive isotopes, but later improvements involved tagging the DNA bases with different coloured fluorescent markers — safer, and also possible to massively scale up. And scaled up it was.
We still use Sanger sequencing in diagnostic labs. Here’s an example of what it looks like:
In this picture, the top section is from a carrier of a genetic condition and the bottom section is from someone who isn’t a carrier. You can’t tell from a black and white image, but the convention is that A is green, C is blue, G is black, and T is red. Each of the different coloured peaks represents one base of DNA, so you can use those colours to read the sequence from the peaks … G, G, T, A, C, T, and so on. Or you could just read the sequence of letters helpfully placed above the peaks, I suppose — but you see how it works.
In the middle of this stretch of DNA, between the two vertical lines, the normal sequence has a C; at the same place in the sequence above, if the image were in colour, you would be able to see that there’s a red peak there, a T — but if you looked very closely, you would see that the normal blue peak, the C, is there as well. This means that one of the two copies of the gene has the usual sequence, and the other has a change. Both copies, the one with and the one without the change, are in the sample and undergo the sequencing reaction. For the bases that aren’t changed, there’s no difference between the two, and the resulting trace looks the same as the normal sample. For the altered base, half of the DNA in the tube has a C and half has a T, and the two overlap — that’s why the peak at that place is about half the height of the C in the normal sample.
That’s easy enough to do now because we already know the sequence of the gene that we’re interested in. But it’s also possible to use Sanger sequencing to discover a DNA sequence where it wasn’t known before. You start from a known section and work your way along, discovering new territory as you go — until you meet up with someone coming the other way. This was the approach used by the Human Genome Project.
So, by the late 1980s, we had the tools that would be used to complete the job. Actually completing it, however, seemed a very long way away. In 1987, the US Department of Energy, an organisation not known to be daunted by large-scale ventures, launched an early version of the project. Their agenda was to find a way to protect the genome from the harmful effects of radiation — potentially important information at a time when nuclear power plants were an important and growing part of the country’s energy supply. By 1988, the National Institutes of Health had joined the DOE, and together they had received funding from the US Congress to attempt the task.
It would be a scant 12 years before this early vision would become reality. Yet for the first half of that time, from the perspective of an outside observer … almost nothing happened. In 1990, a goal of 2005 was set for completion of the project, but, given the expectation of ‘over time, over budget’ for government projects, not many outsiders took this very seriously. By 1994, the main achievement of the HGP was … a denser map. The new map of the genome had not 407 but 5,840 markers, densely spaced across the genome. To the outside observer, not that impressive, perhaps, but it was a critical step on the road to the genome. And in a sign of what was to come, it was delivered a year earlier than expected.
From very early on, the HGP was an international effort, with scientists from all over the world contributing. An Australian, Grant Sutherland, was president of the Human Genome Organisation (although not leader of the Human Genome Project) for part of the project’s life. James Watson (yes, that Watson) was the first leader of the HGP. In 1992, he resigned, and, after a brief interim, Fra
ncis Collins took charge, and would see the project all the way through to its end.
The actual sequencing was done by 20 institutes spread across six countries — the US, the United Kingdom, Japan, France, Germany, and China. Each was assigned a section to work on. The biggest single contributor outside the US was the Sanger Centre at Cambridge, named of course for Fred Sanger. The Sanger Centre, now the Wellcome Sanger Institute, sequenced almost a third of the human genome. Their assigned turf — some shared with other institutes — included chromosomes 1, 6, 9, 10, 11, 13, 20, 22, and X. The project seems to have gathered speed like the proverbial runaway locomotive. It wasn’t until 1999 that the first complete sequence of a human chromosome (chromosome 22) was published. In September 1999, more than a decade after the project started, a press release trumpeted the completion of 821 million bases of DNA sequence. Half of that was still in ‘draft’ form, and more than two billion bases still needed to be sequenced before the project reached its goal. But by June the next year, the project was near enough done for President Clinton and Prime Minister Blair to claim success.