The Mysterious World of the Human Genome

Home > Other > The Mysterious World of the Human Genome > Page 19
The Mysterious World of the Human Genome Page 19

by Frank Ryan


  There is a second, utterly different reading from the entirety of the genome. This reading is not concerned with normal boundaries of genes or regulatory sequences. It can code from any stretch, whether remaining confined to, say, an exon, a group of exons, a promoter region, or a combination of promoter and exons, or the regulatory LTR of a virus, or all of these in a single sequence. And the transcripts are all non-coding RNA molecules.

  This is the explanation for the unknown 50 percent of the genome.

  I gaze at your puzzled face, aware that we are still aboard our magical mystery train, on our way back into the normal world of these pages.

  “The puzzle was an artifact caused by how they actually derived their genomic sequences. The 2001 reading of the draft full genome was based on compiling all the messenger RNA sequences found in the human cell, through a technique called ‘expressed sequence tags,’ or ESTs. The messenger RNA is reversed to its complementary DNA, known as cDNA. So the 2001 pie chart was based not on the DNA of the human genome but the sum coding of all the messenger RNA that is expressed from that DNA.”

  You are still shaking your head.

  “The whole genome, or most of, is actually translated twice—in two utterly different ways when it comes to genetic sequences…”

  “Ah, the whole genome—it's copied twice.”

  “Exactly. That was why the black hole was roughly 50 percent of the genome. It was the missing second translation into non-coding RNAs.”

  We see now how that older attitude to the genome, with its excessive emphasis on protein-coding DNA genes, blinkered our vision to the bigger picture. This more comprehensive understanding is still being further evaluated.

  The nomenclature of non-coding RNAs is simple and predictable; they are named after the sequence within the genome that encodes them. So a sequence based on a single exon—or intron—is called an “exonic” or “intronic” lncRNA. A sequence based on a gene is a “genic” lncRNA, and so on. The lncRNA can be derived from the sense strand of DNA; from a regulatory region; from a promoter sequence; from an entire gene, including all the exons and introns; or even from the intervening sequences between different genes that includes upstream regulatory regions. It can also be coded in much the same way along the anti-sense DNA strand. Some are coded in both directions, being known as “bidirectional transcripts.” There are even mitochondrial lncRNAs and virus or LINE- or SINE-associated RNAs, known collectively as repeat-associated lncRNAs. Their purpose is epigenetic control of the genome, so that, as this extraordinary repertoire would suggest, lncRNAs control a great variety of different genomic functions.

  One such function is to focus on what are known as regulatory proteins—proteins that switch genes on or off. The lncRNA will fix to the DNA thread at an appropriate point, grab hold of the regulatory protein, and direct it to where it needs to be to influence the appropriate gene. Although the research is still ongoing, we already know that lncRNAs are involved in the epigenetic, genetic, and whole genomic regulation of many different and sometimes very complicated biological processes. They appear to be important during the embryonic stage of development, where they play a pivotal role in embryonic stem cells—the pluripotent cells that make up the very early embryo. Here the lncRNAs are involved in the differentiation of these stem cells into those that will differentiate into the different tissues and organs. As we have seen with the Prader–Willi and Angelman syndromes, they appear to play important roles in the hereditary aspects of some of the inherited disorders of metabolism. They are also thought to play some role in many different cancers affecting breast, bladder, colon, prostate, lung, bone, and brain, as well as in melanomas and leukemias. In addition, it is thought they may contribute to the autoimmune disorders, coronary artery disease, neurological disorders such a spinocerebellar ataxia, fragile X syndrome, Alzheimer's disease, and, possibly, the ageing process.

  We can now fill in the blank space to create a new pie chart of the genome:

  Breakdown of the Human Genome, 2012

  How truly extraordinary is the complexity of this structure that lies at the very core of our being! These different genetic entities are not parceled up neatly in different parts of the genome as we see in the pie chart, they are all jumbled up—virus and vertebrate gene, the DNA that translates to long non-coding RNA, ignoring the supposed function of other coding stretches and cutting into it or straddling several in one stretch. The motley assemblage sits cheek by jowl or piled on top of one another throughout the chromosomes. And hidden within this remarkable and messy reservoir of the heredity of each and every one of us is a secret narrative of our human history, from the most distant ancestors from long before the human stage of our existence right up to the immediate present.

  It is to this new mystery I would now like to direct our journey.

  This science appeals to us very differently from physics. It directly informs our understanding of ourselves. Its mysteries once deemed dangerous and forbidden: its consequences promise to be practical, personal, urgent.

  HORACE FREELAND JUDSON, THE EIGHTH DAY OF CREATION

  On February 13, 2014, the journal Nature published an article with the title, “The Genome of a Late Pleistocene Human from a Clovis Burial Site in Western Montana.” The Clovis culture is a prehistoric American culture named after distinct stone tools found at sites near Clovis, New Mexico, in the 1920s and 1930s. Dating back to the end of the last Ice Age, roughly 13,000 to 12,600 years ago, the Clovis people are believed by many American paleontologists to be the ancestors of the Native Americans of North and South America. At the time of publication, the origins of the Clovis people were still being debated, with most anthropologists believing they came from Asia although some proposed an alternative route from southwestern Europe, following the margins of the ice sheets across the Atlantic Ocean. The Montana burial site was already of historic importance. First discovered in 1968 on land owned by the Anzick family at the foothills of the Rocky Mountains near Wilsall, it contained the skull and other skeletal remains of a male infant, roughly 12–18 months old, now known as Anzick-1. It was prized as the only known Clovis burial that also included a considerable assemblage of stone tools and bone-tool fragments.

  The child's remains had already been carbon dated to roughly 12,700 years old—the oldest known burial to date in North America. This, together with the characteristic tools, suggested that he belonged to the earliest phase of Clovis immigration. So the sequencing of his genome might provide invaluable information on the ethnic and geographic origins of the earliest Native Americans. The sequencing work was undertaken by a team of Danish evolutionary biologists together with experts based at the National History Museum and University of Copenhagen.

  What then were the scientists really looking for in the genome of this child who had died during the last great Ice Age?

  The answer is that they hoped to learn something about our human origins and migrations back in a time when all of humanity was still dependent on hunting and gathering for survival, when the only tools and weapons were made of wood, bone, and stone—a time when there were no national boundaries, no empires, no cities, no agriculture.

  To get a clearer grasp of what these paleogeneticists were looking for we need to understand what they mean by SNPs—an acronym for “single nucleotide polymorphisms.” It sounds complicated but, as we shall see, it is simplicity itself—if we hop aboard that now-familiar magical steam train to make a new exploration along the railway track of DNA. In this case, we choose to steam along a stretch of DNA in a germ cell during the formation of the sperm or the ovum, when I suggest we take a closer look at what sometimes happens when DNA replicates. I hardly need to remind you that the sleepers are made up of complementary nucleotides; C always joining in the middle, through that weak cement of the hydrogen bond, with G; A with T; and vice versa. Now, as we watch the cycle of replication take place, the two sides of the rail separate at the hydrogen bond linkages in the sleepers and the long strands
disentangle to begin the process of copying. At my suggestion, we follow this happening along the lowermost rail—the so-called “anti-sense” strand. We now head eastward, over thousands of sleepers before I stop the engine. We get down off the train to take a closer look at a single sleeper.

  “I should explain that this is a section of DNA in a so-called non-coding part of the genome. So it is not part of a protein-coding gene.”

  “What are we looking for?”

  “A mistake in the copying.”

  Just as when we looked for mutations in protein-coding genes earlier, you spot the error. Where a G in the incomplete sleeper should have attracted its complementary C before the sleepers rejoined to form the complete track, there has been a mistake. A thymine, or T, has taken the place that should have a cytosine, or C. This is another point mutation. Clearly this one nucleotide will not quite fit the G, so the sleeper is buckled. But in subsequent replication cycles, the T mistake will now attract a matching adenine, or A, in copying to a new sense strand. This change in the DNA sequence will be passed on to the germ cells, to be inherited by the offspring of the individual, and so on to future generations. This is what is referred to as a single nucleotide polymorphism—or “SNP,” nicknamed a “Snip.”

  Since the mutation is in a non-coding sequence, it won't affect the health of the offspring. Snips like this are ignored by natural selection—or to use the jargon, they are “selectively” neutral. The fact that they are selectively neutral means that they are inherited without bias or favor throughout all future generations. Over time, more and more of these Snips accumulate within an interbreeding species population, creating genetic “markers” in specific places within chromosomes that identify that particular genetic lineage from that time onward.

  There are millions of Snips in every human genome. And they show significant variation from one individual to another, and a great deal more variation between different human populations. Particular Snips gather as identifiable clusters in a specific region of a chromosome, where they tend to be inherited together as a group, or “haplotype,” even remaining undisturbed during the swapping of bits of matching chromosomes during the sexual recombination that takes place during the formation of the germ cells. I should also explain, in passing, that the original definition of haplotype referred to clusters of genes that tended to be inherited as a closely linked collection, but this definition had to be modified when we discovered that most of the human genome is not actually made up of genes. If you are male, your Y-chromosome haplotype should be the same as your father's and the generations of males going back through time in your paternal lineage. The same rules would apply to mitochondrial haplotypes and maternal lineages, though both males and females inherit their mitochondrial lineages exclusively through the maternal line.

  Geneticists also employ another grouping, called a “haplogroup,” which some use to group haplotypes into more distant common groups that share an overall common ancestor. However, I should voice caution here because some geneticists ignore this distinction, using “haplotype” or “haplogroup” to mean much the same thing. For example, Celtic males, such as the original Irish, Welsh, and Basques, share a Y-chromosome haplogroup, as do males of Germano-Nordic origins. But if we go further back, most European males, or females, coalesce into a common haplogroup of still earlier origins when compared with, say, males or females of east Asian origins. Thus haplotypes tend to be used for more closely related family trees and haplogroups for more distant historic and archeological population genetic studies.

  A haplogroup (or haplotype) begins with a “root” or “founder” mutation, which has been located by a combination of archeological and paleogenetic studies to a specific historic human population. This is then added to by subsidiary selectively neutral mutations within the same region of the chromosome, creating recognizable genetic subgroups over time. The root or founder mutations are usually given a capital letter, and the subsequent mutations, which come about through additional Snips, are given numbers or lowercase letters. Thus the lineages appear like the branches of a tree—it begins with a trunk, then major branches, and then finer and finer branches, representing different subgroups radiating from the founder group over thousands or tens of thousands, or even hundreds of thousands of years.

  One such ancient haplogroup, found exclusively in mitochondrial DNA, is called the “D” root or “clade.” This originated as a founder Snip in a population living in northeast Asia, including present-day Siberia, roughly 48,000 years ago. Over time, additional Snips arose in the mitochondrial DNA of the descendants of the D population, leading to four divergent clades or branches, called D1 to D4, and additional mutations within the still migrating clades gave rise to further sub-branching over time. Each new branch, or sub-branch, would correspond to a geographic location and timings of population movement, which can be cross-referenced to archeological findings, such as carbon 14 dating, so population geneticists can plot the historic movements and interactions of people within these clades all over Asia, Europe, and, in due course, North and South America.

  Returning to the child Anzick-1, we recall that the skeleton was carbon dated to 13,000 to 12,600 years ago, which places him to very early in the human colonization of the Americas. His mitochondrial haplogroup was found to be D4h3a, which is one of the rare lineages specific to Native Americans. Given the date and the haplogroup, the researchers concluded that Anzick-1 belonged to an ethnic group that must be close to the founder of the D4h3a sub-lineage, and thus his people were thought to be directly ancestral to 80 percent of all Native American peoples and close cousins to the remaining 20 percent. The study of Anzick-1's genome also turned up some distant commonalities with European haplotypes.

  In the same journal, a paper by the same group of genetic and archeological investigators described a young boy's remains, dating back 24,000 years, from an upper Paleolithic burial site in Siberia. The oldest burial of any modern human discovered to date, study of his haplotype revealed that he belonged to an even older mitochondrial haplogroup than Anzick-1—a basal lineage of haplogroup R. Today this lineage is found in people living in western Eurasia, south Asia, and the Altai region of southern Siberia. Sister lineages of haplogroup R form haplogroup Q, which is the most common haplogroup in Native Americans, and in Eurasia the Q haplogroup lineages closest to those of the Native Americans are also found in the Altai region of southern Siberia. In the opinion of Danish paleontologist Eske Willerslev, who led the sequencing of both sets of remains, “At some point in the past a branch of east Asians and a branch of western Eurasians met each other and they widely interbred.” Their descendants headed east, across the land bridge between Asia and North America, discovering two huge and bounteous continents that had never been populated by humans before. They gave rise to the majority of Native Americans we see today, including Anzick-1. While not everybody agrees with Willerslev, the combination of the two infant boy discoveries would explain how Native Americans share 14 to 38 percent of their genomes with western Eurasians.

  Snips, haplotypes, and haplogroups are not exclusive to our nuclear genome. I referred to mitochondrial DNA in both these genetic explorations, and now we can hop back aboard our train for another trip into that mysterious ultramicroscopic world to probe yet another mystery. But our destination on this trip is no longer the landscape of the nuclear genome; this time we are heading into the territory that lies outside the nuclear membrane, into the equally intriguing landscape of the cytoplasm, moving carefully through an incredibly congested and frenetically busy space that might be compared to an industrial landscape, in which fresh proteins are being manufactured, ageing proteins are being broken down for recycling. A space where huge machines, resembling free-floating sausage-shaped juggernauts, are extracting energy from gaseous oxygen and packaging it in storable form so it can be used in every living cell. These are our vitally important human mitochondria.

  As we watch, a mitochondrion develops a constri
ction about its center, and before our startled gazes it buds, the parent mitochondrion cleaving itself into two clones. As our steam engine carries us deeper into the ultramicroscopic world of the genes, we find ourselves passing through the outer wall of one of the sausage-shaped juggernauts, which, as we magically contract in size, has now grown comparatively gigantic, fully to the size of a city. Within its cavernous spaces we come upon another railway track, with its gleaming rails of deoxyribose sugar and stiffening phosphates, and the familiar sleepers, with their interlocking nucleotide bases, heading away into the blurry distance. We have entered the world of the mitochondrial genome, with its very different evolutionary origin from that of the nuclear chromosomes, an enigma within the greater mystery.

  “I know you talked about mitochondria—about where they came from…”

  “A genetic union between what was once a free-living parasitic bacterium and the single-celled forerunner of all complex life on Earth. The mitochondria still retain quite a lot of their bacterial origins. They retain enough of their genome to reproduce by themselves, which is what that mitochondrion was doing. While each of us inherits half of our nuclear genome from our father, we inherit more than that from our mother. In addition to half her nuclear genome, we also inherit the physical structure of a cell—the ovum—which includes the mitochondria.”

 

‹ Prev