The Boy Who Wasn't Short

Home > Other > The Boy Who Wasn't Short > Page 3
The Boy Who Wasn't Short Page 3

by Kirk, Edwin;


  Now it’s poisonous. Do not drink.

  Allow to stand, and wait. DNA will rise into the alcohol layer, in a gloopy white mass.

  At this point, if you like, you can take a wooden skewer and lift the DNA out of the glass. It has some interesting properties. Dab it against the side of the glass a few times — the clump on the end of your skewer will get smaller. Lift it carefully enough out of the liquid and you may be able to get a long, thin strand to rise up from the surface. DNA is sticky, and can be loosely or tightly coiled. When it shrinks, that’s loose DNA coiling more tightly as it sticks to itself. The string of DNA you can pull up from the surface of the liquid is a series of individual strands, sticking to each other as they are lifted out of the container.

  I strongly urge you to give this a go. It is deeply satisfying to know that you are holding the stuff of life in your hands. Even if it does look and feel rather like snot.2

  [2 No, I don’t know what it tastes like.]

  *

  Francis Collins was the focus of attention that night at the DNA Dinner, not because of his guitar playing and singing (fine though they were), but because of his leadership of the Human Genome Project. Three years previously, on 26 June 2000, there had been a ceremony at the White House, announcing that the human genome had been sequenced. President Bill Clinton hosted the event, and British prime minister Tony Blair joined by satellite. Collins and the publicly funded Human Genome Project shared the limelight that day with a private company, Celera. A remarkable effort, led by Celera’s CEO, Craig Venter, had turned the sequencing of the human genome into a race, ending in an effective dead heat between the public and private efforts.

  Some might say that the ceremony jumped the gun a bit, because there were still so many gaps in the sequence — no fewer than 150,000 of them — and because at least 10 per cent of the sequence was still missing. In fact, there was another announcement on 14 April 2003 that the project was really finished, but, even then, there were still gaps. By 2004, things had improved considerably, yet there were still 341 gaps … and even today, the job is not quite finished.

  Nonetheless, at the time of the announcement in 2000, there was a good working draft — and to be fair, that was what was announced: completion of a draft sequence. Most researchers, most of the time, could consult the data with the expectation that there would be detailed information available about the region they were interested in. It was an exciting time, but it still wasn’t entirely clear to those of us in the clinical world exactly what we were going to use the genome sequence for.

  One day in late 2001, a package arrived in our department that was to thoroughly prove this point. It was the human genome on disk — sent to us, free of charge, by Celera. Excitedly, we opened it up, loaded it onto a computer, and started exploring the contents. And quickly stopped. We had no idea how to interpret the information we had been sent, and no way to connect it with our patients. As it turned out, it would be more than a decade before genomic data would become a routine part of practice in clinical and diagnostic-laboratory genetics. Nowadays, I consult the genome browser run by the University of California, Santa Cruz numerous times during the course of a working week. I could not do my job without it.

  So — what’s in the genome? What exactly am I browsing, thanks to UCSC?3

  [3 I’m afraid you might find my browser history a little dull.]

  That white gloopy stuff you extracted from the strawberries is made up of four different chemical building blocks: adenine, cytosine, guanine, and thymine. Their initials are A, C, G, and T, and they are called nucleobases, or bases. The human genome is composed of about three billion bases altogether. Most of the time, they exist as base pairs, because of the double helix in which DNA usually exists. That double helix is made of two individual strands, which complement each other. A on one strand sticks to T on the other, and C on one strand sticks to G on the other, so that the double helix looks like this:

  G A T T A C A

  | | | | | | |

  C T A A T G T

  The two strands run in opposite directions — there is a direction to DNA, related to the way it is copied and translated to make proteins. So the strand that complements ‘GATTACA’ would be read as ‘TGTAATC’ by the cell’s machinery, not ‘CTAATGT’.

  Three billion bases of DNA is an awful lot. To put that in perspective, here’s a chunk of human genetic code:

  GAGGGGTGACAATAGGAATATTTGCTTTTCATCCCTCATGATCAT

  CACCCTTCCCTTCTTCCACTCTTACATTTTTATTTCCCAAAGT

  GGGCTGCAGTACTGGTGGTGAAAAGCAGTCATTCTAGAACT

  CACTAGCTTGGTCAGGAAACACTGGGCATGCTACTCCGTGGGAATC

  CCAGGTAATTTTATACAGTGATGATGGGTCATCCCTACAGCCTG

  CCTGGAGTTTCTGAGCTTAACTTTTCCTGAGTCAAATCCAAGTGAC

  CATTTGGTTATGCTGTTCTTTCCAGATCTTTCCTTGTTGAACAGT

  GTCAGCTGTCATTCACAATTCTCTTTACCTCCAGAGCAATTTGT

  GGAGAAGTCGTCCTGTGCCCAGCCCCTGGGTGAGCTGACCAG

  CCTGGATGCTCATGGGGAGTTTGGTGGAGGCAGTGGCAGCAGC

  CCGTCCTCCTCCTCTCTGTGCACTGAGCCACTGATCCCCACCACCCCCA

  TCATCCCCAGTGAGGAAATGGCCAAAATTGCCTGCAGCCTGGAGA

  CCAAGGAGCTTTGGGACAAATTCCATGAGCTGGGCACCGAGATGATCAT

  CACCAAGTCGGGCAGGTAGTTGGGACTTGGGGGTTGGGGGTTGGGG

  GTGGAGAGTCAGGACACTCCCTGGGTAGTTGAGGGTGCTTCCAG

  GAACTAGATGAGAGCTGGCTGGTCATGGAGCGGAGAGACAGCTTG

  GCTCCAGGGCAGCTGCTTTCCACCAGCTTGCATTAGGAGCTACAG

  GATGTCTAGTCATTTGCGTTCTCAGGATTTGGTCATGGGAAG

  CCCCACCCTGGCTTTGTTGAGAAGGGCACAGGGACCAGGGAG

  ACACACTAACCCCGAAGGGTGTGGTCTGCTTTCCCTGGAGCTG

  GAGAAGGTTTGGCGGGTGGAGGGTCGGGATCTGGAAGGAGGAG

  GAATTTGTGCCTGGGTGCCTGGTGAGCTGCTGGGTGCTTCTAGG

  TAGGTGAGTAGCTTCCCTTTTATCAGCCTCAATTTGCAAAAGCTG

  CCAGCTCCCATTAAAAACTAAAATTAAAACCTGGGCGGAAGAAT

  GAAATTTGAAACGATAAAATTCCCTGTAGGAAGGAGCACTGCTC

  GGGGCCTCTTGGCGCCAGAGCCGGGCGGGCTTTGGCCAGGCAG

  GAAGCTGCAGGGCTGCAGGGAGGTTGGGATGGGGCAGAGGCTGG

  CAAAACTTGGTGGCTCTAGCTCTTGGGACTACAGAAAATACCT

  GCAGGGCATCTGAGAAATCCTTCCCAGAAACCTCTGCTTTTG

  GCTTTTATTTTGCAAGAGCAGAGTTTTCTGGCTGGGATGCGGGT

  GAGTTGTGTGACTGGGTCAGCTCCAGGGACTTCGGGTCCTGGGA

  CACTTAATGTGCTTGATCGTTAAAATGCATGGGATTTTCCCTA

  ATCACAGACCTTCTGGAGTTAACACATACCCCCACCCCCAC

  CCCCACCTTTTCACCTAGCAATTAACACCTGCTTAAAGGTGA

  CACTTAAAATTATCTAGGCTTGGAAGAAAACCCTGTCTCTGTAT

  TCACTTCTCTGAGGCTTTAAACAAAACAAAAGAGGGGTTTGTG

  GACCGGATAGAGAGGGGAGTCAGACCCTTTCCCTCCTTCCCTC

  CCTCCCTTCCTCTCTTCCTAATTCAGGTCAGTTTATTAGGCAG

  CATAAACAGGGCCCATTCTCTCTCTCTCTCTCTGTCAGGAGGAT

  GTTTCCAACCATCCGGGTGTCCTTTTCGGGGGTGGATCCTGAGGCCAAG

  TACATAGTCCTGATGGACATCGTCCCTGTGGACAACAAGAGGTACCGC

  TACGCCTACCACCGGTCCTCCTGGCTGGTGGCTGGCAAGGCCGACCCGC

  CGTTGCCAGCCAGGTTCGTGCCTCCAGATTTTTCACTGAGAAAACT

  GTTAGTGCATCTGTCAGAATGTTTCTGGCTTGTGTGAATTTTAAG

  CAAGTGTATTTTTAAAGCAGCGGGCTCTGGCAAGAGAGCATTC

  CAAGCCTGGACACTCCAGGATTGACTACACAAAACATGGGCTAG

  GCTCTGAGAAAGGTAGTTTGTGCATAGAGAAAACACTGTCTTTA

  AGTTTATGTTTCGTTAGGCAGTAATTCATTTCAAAGTTTTTCTTA

  AATTTCAATTTGAGTATTCATTAGAAATGTGGACCCATTTTGTATA

  AATATAAATATAGACATCCTCTCTAATTGCTGCTTAAAACCAGAGTGAA

  This is one of my favourite bits of the genome — it’s part of the gene TBX20, which had a starring role in my PhD. Printed at the same density on A4 paper (single sided), you’d need 781,250 sheets of paper for the whole of the human genome. If each sheet of paper is 0.1 mm thick, you would need a stack of paper just over 78 metres tall (198 feet) — halfway in height between the Sydney Opera House and the Statue of Liberty. Without a key, of course, this would be just a stack of meaningless letters. With the key — that stack of paper contains untold scientific wealth.

  What is the key? And what’s in the genome? It turns out that it’s more a matter of a set of keys, rather than just one. DNA tells many s
tories, if you can read them.

  As we saw in the previous chapter, we have pairs of chromosomes4 — pairs, because you get half of your genetic information from your mum, and half from your dad. In turn, you pass on half to each child you have. So — one copy of chromosome 1 from mum, one from dad, and so on. Chromosome 1 is the largest. It’s about a quarter of a billion bases long, and is home to more than 2,000 genes. The smallest, chromosome 21, is less than 50 million bases long and holds only a couple of hundred genes. The humble Y chromosome is just a little longer than chromosome 21, but has only about 50 genes.

  [4 Feel free to go back and admire my chromosomes again at this point.]

  There is also some DNA outside the nucleus of the cell: we have a second genome, a tiny one (only 16,569 bases and 37 genes). It lives in structures called mitochondria — more about them later.

  Speaking of genes — you’ve doubtless heard of them, because they are the most famous things in the genome. As I explained, their job is to act as a blueprint that tells the cells how to make proteins, which in turn do the many complex tasks a cell needs to do in order to be alive and contribute something useful to your body. The parts of the genes that get translated to make proteins only account for around 1–2 per cent of the genome.

  There’s still quite a bit of controversy about how much of the rest of the genome actually does anything. Some of the non-gene bits are definitely useful and important. For example, the centromere, at the waist of the chromosome, is essential for making sure the chromosome copies go where they are supposed to when cells divide. Mess that up and the consequences are not good. The ends of the chromosomes have structures called telomeres, which form a protective cap. You may be familiar with the Bernard Bresslaw song about feet:

  You need feet to keep your socks on

  And stop your legs from fraying at the ends

  Chromosomes are not known for wearing socks — but like your legs, it’s bad news if they fray at the ends. As you get older, the telomeres themselves do tend to fray a bit, gradually getting shorter with each cell division. During the development of many cancers, they wind up a lot shorter than they should be, or disappear altogether, leaving the ends of the chromosomes exposed and vulnerable to damage. Paradoxically, what happens next is a restoration of the telomeres — as cells are undergoing transformation to malignancy, their chromosomes reacquire robust telomeres. This is part of what makes a cancer cell ‘immortal’.

  Although the parts of genes that code for proteins only account for about 1–2 per cent of the genome, genes spread across about a quarter of the genome. The reason for that difference is that most genes are a mix of two types of sequence — introns and exons. The exons code for protein, i.e. their sequence specifies which amino acids to include in the protein, as well as when to start and stop. By contrast, the introns don’t code for anything, and, while they undoubtedly still do serve a purpose, we still only have a fairly limited understanding of what that is.5 Introns can be truly enormous — many thousands of bases long. Sometimes, they are so big that there’s room for an entire gene to sit inside an intron of another gene, usually running in the opposite direction, on the other strand of DNA. The double helix is a two-way street.

  [5 One well-understood function of introns is to enable the use of the same gene to make different versions of its protein, sometimes versions that have quite varied functions. This is done by ‘alternate splicing’ — some exons don’t always get used — so there is sequence that could be either exon or intron. Plenty of genes don’t do this at all, but there are some proteins that come in numerous flavours depending on the way that splicing happens. Another function of introns is to help control when and where a gene is switched on — a regulatory function.]

  You can see exons and introns in that chunk of TBX20. The bold sections are exons; the bits in between are introns. You can even see some of the genome’s operating instructions, written right there in the DNA sequence. At the start of each intron are the bases GT; at the end of each intron are the bases AG. Together, GT and AG form a key part of a message to the cell’s machinery that says, ‘There’s an intron here. Not needed for protein — please cut out.’6

  [6 This is something of a simplification. The GT and AG are key parts of the signal that says, ‘I’m a splice site,’ but the bases that surround them are important, too. There’s a more detailed description of the relationship between gene and protein in the notes section, if you’d like to read more about this.]

  What percentage of the human genome actually does something? Well, we still don’t know. In September 2012, the results of a major follow-up to the Human Genome Project, called the ENCODE project, were released in 30 scientific papers, all in one hit. Getting that many scientists to cooperate so that 30 papers were released into the wild simultaneously was perhaps as impressive an achievement as the actual science in the papers. The ENCODE Consortium reckoned they had found a function for 80 per cent of the genome. Most of it was claimed to be busy controlling the function of other bits — a rather bureaucratic vision of cell biology. There was a lot of criticism of this announcement at the time, and the debate rolls on. Recently, a paper was published that argued that only 8 per cent of the genome is functional. That’s quite a gap. I have no idea what the right answer is, but I doubt it’s as low as 8 per cent or as high as 80 per cent.

  There’s an awful lot of the genome which looks like genetic wreckage — genes and other elements that have lost their function over the course of evolution. For instance, there are loads of smell receptor genes that are broken and don’t do anything — earlier in evolution, our ancestors needed a keen sense of smell to survive, but for a long time we’ve been able to get by just fine with a comparatively poor sense of smell. So when those genes acquired mutations, it didn’t cause a problem, and the broken form was just passed on to future generations. You inherited hundreds of broken genes from your parents, and in turn you’ll pass them on — or have already — faithfully copied, and still not doing anything.

  There are also lots of repetitive sequences that don’t seem to be up to much. Sometimes, viruses copy themselves into a host’s DNA, and there’s quite a lot of what look like old virus sequences scattered about the place. There are chunks of DNA that have been copied in what’s called duplication events. If you have spare copies of something, it doesn’t matter much if one of them loses function, so you often wind up with two versions of a gene, one that works and one that doesn’t, a pseudogene. And there are bits of DNA that seem to be there simply as a side effect of DNA’s drive to be copied: long, long strings of sequence that looks like nothing much (ATATATATATATATATAT …).

  On the whole, this doesn’t seem to cause us any bother. There’s no apparent pressure on the human genome to get more efficient, or maybe whatever pressure exists is outweighed by DNA’s tendency to copy itself, and by the various mechanisms which introduce new bits of DNA into the sequence. There are plenty of other organisms that get by fine with much bigger genomes than ours, with proportionally even more freeloading DNA. There’s an amoeba, Polychaos dubium, that reportedly has a genome more than 200 times the size of ours. The humble onion’s genome is five times bigger than ours, and, on the whole, you are more likely to eat an onion (or extract its DNA) than the other way around. On the other hand, the pufferfish, fugu, has a genome only about an eighth the size of ours … and pufferfish are quite a bit more complex than onions.

  It does seem like there might be a price to pay for an oversized genome, at least when times are hard. There’s a plant called teosinte, which is thought to be the ancestor of maize. In 2017, a paper was published that compared the size of the genomes of different species of teosinte, living at different altitudes. Many plants have enormous genomes — but in teosinte, at least, the higher the altitude, the smaller the genome. If you’re living in a tough neighbourhood, high on a mountain, you can’t afford to waste energy copying DNA if it isn’t doing a
valuable job for you.

  It’s just about possible that the human genome happens to be exactly the perfect size, so that everything in it has an important role. But that would seem like quite a fluke. More likely, the human genome does indeed carry around its fair share of true ‘junk’ DNA.

  That’s not to say there isn’t anything impressive and interesting about the human genome. When I started working in genetics, we used to confidently tell people that the human genome contained about 100,000 genes — because we’re such important, special creatures, there had to be a lot, right? Then that estimate started going down … and down … and down. By the time the Human Genome Project was completed, the number had gone down to a bit over 20,000 genes. Part of the explanation for this is that our genes have quite complex structures, and an awful lot of them do more than one job. Sometimes, that means doing a similar task in slightly different ways, like a muscle protein that forms a bit differently depending on whether it is working in heart muscle or in regular muscle. Sometimes, though, it means the same protein can be used to do wildly different jobs. This is called ‘moonlighting’. For instance, there’s an enzyme — a protein that makes chemical reactions happen — that also plays an important role in making the lens of the eye transparent.

  But you can say similar things about the genomes of a lot of organisms, and all of it about the chimp genome. Chimpanzees, especially the bonobo (or pygmy chimpanzee), are so close to us genetically that a Martian would probably view us as just different varieties of the same animal. We are closer to chimps than African elephants are to Asian elephants, so you could hardly blame our extraterrestrial visitor for being confused.

  How do we know all this stuff? It comes back to the Human Genome Project.

  At the time of its conception, the HGP was an enormously ambitious idea. Only a small fraction of the genome had been sequenced. Mostly, what we had was a kind of outline sketch — a map, in fact. You often hear the expression ‘mapping the genome’, and indeed that was the first step. But we never map a person’s genome now, because the job has been done already — just as you don’t have to map someone’s whole neighbourhood in order to find their house. A genetic map is not like a street map, because it only really has one dimension, not two — it’s all about what lies where along the string of DNA that makes up a chromosome. To make such a map, you need a series of markers — genetic signposts — with a known relationship to each other. Those signposts consist of bits of DNA with some way of uniquely identifying them. Say we have three such markers, A, B, and C. If we make a genetic map that includes A, B, and C, it would consist — at the least — of the information that they sit along the chromosome in that order — A-B-C — rather than A-C-B or any of the other possibilities. A better version would say that A, B, and C are all on chromosome 1, and not on any of the other chromosomes. And the most useful type of map would also tell us how far apart they are.

 

‹ Prev