FIGURE 1.2. Human chromosomal DNA in the cell nucleus. Clockwise from the top: The cell and its nucleus, one of the 23 chromosomes, and the DNA molecule. Source: National Institute of General Medical Sciences, National Institutes of Health.
In human cells, where there are 3 billion base pairs of DNA in the nuclei, the number of genes, defined as functional segments of DNA that are used to encode proteins, is still uncertain. The International Human Genome Sequencing Consortium placed the number of protein-coding genes at 20,000 to 25,000. Other groups predict larger numbers. It is estimated that 97 to 98 percent of our DNA consists of noncoding regions.3 Some of these sequences may have other functions. For example, promoters reside near the genes and initiate gene expression starting with transcription (the process of preparing a readable RNA message of the DNA so a protein can be synthesized by a process known as translation). Other DNA sequences, called enhancers, can raise the amount of product produced. There is also noncoding DNA, sometimes referred to as “junk DNA” or “evolutionary debris,” that allegedly has no function, or at least not one that is known.
More recently, through a project called ENCODE (meaning the encyclopedia of DNA elements), scientists have discovered that a large amount of the DNA previously considered useless is essential to regulatory processes in the so-called functional part of the genome where gene transcription takes place.4 In the words of geneticist Francis Collins, formerly director of the U.S. National Human Genome Research Institute, “Transcription appears to be far more interconnected across the genome than anyone had thought.”5 The so-called noncoding regions probably play a role in coding small RNA molecules.
The Typing of DNA for Forensic Identification
Sequencing DNA means reading its code, or the series of four letters (A, G, C, and T) that make up the bases of the DNA molecule. As discussed earlier, humans have about 3 billion nucleotides in each of the 23 pairs of chromosomes. The sequence of nucleotides (reading of the human DNA) is 99.9 percent identical for all people. In other words, human genetic variation is accounted for by only 0.1 percent of DNA, or about 3 million bases.6 It is this variation in a segment of DNA that allows forensic scientists to determine whether two DNA samples could have come from the same individual.
The process of DNA analysis always begins with a sample of biological material: it could be a strand of hair, blood, sperm, tissue, skin cells, or saliva. To be useful for the analysis, the sample must have intact cells or DNA that has been removed from the cell. Unless the samples are very carefully handled, they can become contaminated by elements from the environment, such as DNA from plants, animals, insects, bacteria, or other human beings.7
If the biological sample contains cells, the first task is to break open the cells (lysing the cell membrane) or to dissolve the matrix surrounding the DNA (as in a hair shaft) to retrieve the DNA. The separation of the DNA from the biological sample is called extraction, while the separation of sperm-cell and non-sperm-cell DNA is called differential extraction. Breaking the membrane of the cell to release its DNA can be done by exposing the cells to detergents and other chemicals; different chemical recipes are effective for different cell types. Once the DNA is removed from the cell, it has to be broken down into manageable pieces so that it can be identified by its unique sequence of bases. One method to accomplish this involves the use of proteins isolated from bacteria called restriction enzymes (EcoRI in figure 1.3) that cut DNA at specified sites, leaving fragments of DNA varying in length and defined by the presence of a restriction site.
To identify specific fragments with the genetic variation of interest, scientists use probes—short, single-stranded fragments of DNA that are synthesized in a laboratory and labeled radioactively or with some other detectable molecule. A probe will seek out and attach to its complementary sequence (if it is present) to form a double-stranded sequence of DNA. Within the DNA code, base A is always complementary to base T, and G is always complementary to C. Thus a probe consisting of the nucleotide AGTTAGC is the complementary strand to TCAATCG.
FIGURE 1.3. A restriction enzyme EcoRI is used to cut double-stranded DNA at a specific site, leaving the ends available for reattachment to complementary pairs. Source: From Molecular Cell Biology, 5th ed. by Harvey Lodish, Arnold Berk, Paul Matsudaira, Chris A. Kaiser, Monty Krieger, Mathew P. Scott, S. Lawrence Zipursky, and James Darnell. © 2004 by W.H. Freeman and Company. Used with permission.
Early Method of DNA Analysis
The first widely used method of forensic DNA analysis was based on restriction enzyme digestion and was called restriction fragment length polymorphism (RFLP). Introduced in 1988, RFLP involves cutting relatively large segments of the DNA molecule into smaller fragments and sorting those fragments by their size using a process called gel electrophoresis (see figure 1.4). RFLP analysis requires that a minimum amount of DNA from a sample be within a particular range from 50 to 500 nanograms (ng; 1 nanogram = 1 billionth of a gram or 10–9 gm). This corresponds to a bloodstain about the size of a dime to a quarter.
The restriction enzymes break the long DNA strand at specific sites. Because the DNA from two different individuals may have fragments that differ in size when they are excised at a specific place or locus (the term “locus” is used to designate the position or site of a DNA sequence on the genome), measuring the size of the fragments can distinguish DNA samples from different individuals. The difference in size of a fragment at a locus between individuals is due to the presence or absence of a restriction site and/or the number of short repeating noncoding sequences contained within a fragment. Once the DNA strand taken from a sample is broken up into discrete segments by restriction enzymes, it is then put through a process that separates the segments by weight. The DNA is placed in small wells on one end of a flat gelatin surface and then exposed to an electric field. The separation of the DNA segments by size is based on the fact that each DNA strand is negatively charged. When placed in an electric field, the negatively charged DNA will move toward the positive side of the electric field. Smaller DNA fragments move more quickly than larger ones because the latter experience more resistance migrating through the sieving gel medium, thus allowing the fragments to be separated according to size. After the fragments are put through gel electrophoresis, placed on a membrane, and exposed to x-ray film, the result is a series of black bands like the bar code in consumer products. When the two profiles with dark bands (each representing the size of a DNA segment) match, this suggests that the DNA samples may have come from the same individual. If the two profiles are different, then the two samples could not have originated from the same individual. The x-ray photograph showing the position of the DNA segments is called an autoradiograph or autorad (figure 1.5).
FIGURE 1.4. A rendering of a gel electrophoresis device in which gene fragments exposed to an electric field over a gelatin surface are separated by their size. Source: From Genetics: Principles and Analysis, 4th ed. by Daniel T. Hartl and Elizabeth Jones, 1998, Jones & Bartlett Publishers.
The RFLP analysis given in figure 1.5 is of DNA from suspects in a sexual assault case. The columns show the DNA segments of the victim (column 4), suspect 1 (column 5), suspect 2 (column 6), and the crime-scene sperm DNA (column 8). There is a matching profile at this locus between the crime-scene sperm DNA and suspect 1. The other columns are controls with known DNA patterns to validate the process. The columns marked ladders contain known sizes of the band sequence fragments that are used to estimate the size of the resulting profile bands.
FIGURE 1.5. An example of an autoradiograph of radio-labeled DNA segments in a sexual assault case, developed from RFLP analysis. It includes the DNA typing of two suspects and a victim with results for control samples and ladders that are used to estimate the size of the DNA fragments. Source: Genelex.Com, http://www.healthanddna.com/genelex/about.html (accessed May 23, 2010).
PCR Method for Copying and Amplifying DNA
RFLP has been replaced by a far more sensitive method of DNA ana
lysis, based on a technique called the polymerase chain reaction (PCR). PCR involves replicating, by a form of chemical copying, tiny defined segments of an individual’s DNA (see figure 1.6). The PCR technique has revolutionized forensic DNA analysis because it is far more efficient and can be applied to the analysis of very small DNA samples. While the RFLP method could take upwards of six to eight weeks, PCR can complete a DNA analysis in one to two days. More important, RFLP requires a relatively large DNA sample; however, typical sample sizes for PCR analysis range from .5 to 2.0 ng. Thus PCR is equipped to analyze DNA samples 500 times smaller in quantity than RFLP requires. Because it is so much more sensitive, PCR is more useful than RFLP for analyzing degraded samples of DNA from blood, saliva, hair, semen, and other sources. On the other hand, because of its high degree of sensitivity, PCR is also more vulnerable to contamination (see the introduction and chapter 16).
The basic idea behind PCR is that copies of a small segment of DNA are made, and then copies of the copies are made through a cyclic process until a sufficient quantity of the DNA is obtained for analysis. A simple analogy to the PCR amplification process is given as follows. Imagine an all-school dance, where the number of boys and girls is equal. The event begins with one pair of dancers (one boy and one girl) in the center of the room. Surrounding the dancers on the perimeter of the room are girls and boys in no special order. At some point in the dance, prompted by a change in the music, the dance pair divides (analogous to the split of a double-stranded DNA molecule into two single strands). The boy meanders to the perimeter to select another partner, his new complement—while the female does the same. The result is that from one dance pair come two, four, eight, and so on. The process can be repeated many times until the dance floor is filled to capacity and no one is left in the periphery (all complementary pairings are actualized).
FIGURE 1.6. A representation of the DNA amplification process using the polymerase chain reaction (PCR). Source: Andy Vierstraete.
In the PCR process the DNA extracted for analysis is mixed with a group of chemicals including primers (chemicals that will identify the specific DNA targets to copy and initiate DNA replication) and enzymes. The primers are single-stranded nucleotides synthesized in the laboratory and attached to specific target sites of the DNA sample of interest. The extracted DNA and these chemicals are then placed in a machine called a thermal cycler. This machine runs the sample through a series of heating and cooling cycles. The thermal cycling consists of three steps. In step 1 the sample is heated to 95°C, which is the temperature at which double-stranded DNA unzips to form two single strands. This process is sometimes called denaturing the DNA. In step 2 the sample is cooled to 55–60°C, which is the temperature range best suited for the primers to attach to the single strands. In step 3 the temperature is raised to 72°C, and an enzyme called Taq DNA polymerase acts as a catalyst to synthesize a complementary target to the single-stranded template DNA. A complete double-stranded DNA molecule is created. This is called the synthesis step. This process duplicates segments of DNA. A cycle of the PCR takes less than a few minutes. The three-step process is usually repeated two to three dozen times (using commercially available kits), and with each process the number of copies of the target DNA is doubled. By the 28th cycle the PCR process can make billions of copies of a particular DNA molecule that may have been present only a few times in the original sample. Once the DNA segments from two or more samples are replicated, these segments have to be measured and compared, as described in the following section.
Polymorphisms in DNA Sequences
Polymorphisms are defined as variations in DNA sequences at a particular position (locus) on the human genome, when the locus varies in at least 1 percent of the population. The variation in the DNA sequence can take multiple forms. It can be a change in one base of a sequence. For example, the difference between the DNA sequences AGACCTAG and AGACCTAC is that the last base G in the first sequence is replaced by the base C in the second sequence. Because the difference in the sequences is one base, this is called a point (or site) mutation or a single-nucleotide polymorphism (SNP, pronounced “snip”). A mutation at the site of a DNA sequence recognized by a restriction enzyme may prevent the enzyme from cutting the DNA at that site. Thus, if an enzyme cuts a DNA sequence every time it sees the base sequence TAG, it will cut the first sequence but not the second site if that individual carries a SNP that changed the sequence.
A second type of polymorphism is represented by repeated short sequences that lie adjacent to one another on the DNA thread (tandemly repeated). For example, in the two DNA sequences listed below, one has three repeats of AGTCA, and the other has five repeats of the same base sequence at a particular locus of the genome. This is called a “length polymorphism.” A polymorphism may or may not affect the physical characteristics of an individual. Those used in forensic identification do not seem to have an effect on the physical characteristics of an individual. Human DNA loci that are highly polymorphic (variable) are especially useful for identifying individuals when the length polymorphisms are made up of STRs because they are more likely to display differences between two randomly selected individuals. Thus the more variable the locus, the better chance there is of excluding a person wrongly associated with a forensic evidentiary item.
AGTCAAGTCAAGTCA (three repeats of AGTCA)
AGTCAAGTCAAGTCAAGTCAAGTCA (five repeats of AGTCA)
Repeats of short contiguous segments of DNA (i.e., STRs) are used to ascertain the source (identity of the contributor) of a biological sample in most forensic DNA laboratories. In the U.S. forensic DNA database system, 13 STR loci are chosen that have high variability. The names and locations of these 13 core STR loci on the chromosomes, as well as the loci on the X and Y chromosomes that are used to determine the gender of the DNA donor, are shown in figure 1.7.8
These loci are named by scientists within the genetics community by alphanumeric terms like D3S1358 or FGA. The nomenclature does not follow a logical form, but the letters and numbers do have significance for the chromosome in which the sequence is found, whether the sequence is part of an intron (noncoding region of DNA) or resides outside a functional gene, and when the sequence was discovered. The automated DNA analyzer determines the number of STRs for each of the two alleles (one for each chromosome) in a designated locus. In table 1.1 the evidence sample shows 13 and 15 repeats for the STR at locus A and 12 and 14 repeats at locus B. It matches the number of repeats at the same loci for suspect 2. Thus suspect 2 cannot be excluded as a potential source of the evidence sample, while suspect 1 can be excluded.
FIGURE 1.7. The alphanumeric names of the DNA sites used by U.S criminal justice authorities for forensic DNA analysis. Source: National Institute of Standards and Technology, “13 Core CODIS STR Loci with Chromosomal Positions,” http://www.cstl.nist.gov/biotech/strbase/images/codis.jpg (accessed May 23, 2010).
TABLE 1.1 Short Tandem Repeats for Two Alleles
STRs with more repeats are longer. The lengths of the STRs are compared in a DNA analyzer in a process similar to that of electrophoresis described earlier. The DNA segments are exposed to an electric current in a narrow capillary tube instead of a flat gel. As with electrophoresis, the shorter segments move more quickly through the capillary tube, and thus the length of the segment can be determined by the analyzer.
In table 1.1 suspect 1’s DNA does not match the evidence sample in two loci. All we need is one mismatch of an STR length in a single locus to declare that the samples are not from the same individual, assuming that no errors were made in the analysis. Suspect 2 matches the evidence sample in two loci. If the match in the STRs continued across 13 loci, then suspect 2 would not be excluded as the source of the DNA of the evidence sample, and the probability that suspect 2 carried that same DNA profile by chance would be extremely low.
To obtain absolute 100 percent probability of a match, we would have to compare 3 billion base pairs of the sample and the suspect’s DNA. That would be very e
xpensive and time consuming and would provide more scientific evidence than would be necessary to make a strong case. Instead, a very small portion of the genome is analyzed (26 alleles) and, in cases of a “match,” an estimate is provided for the chances that this match might have occurred by chance. This is called the “random-match probability.”
Random-Match Probability
On November 13–14, 1997, at a meeting of representatives of 21 laboratories throughout the United States, forensic scientists and the FBI reached an agreement on using 13 STR loci for submitting profiles to the national forensic DNA database, otherwise known as the Combined DNA Index System (CODIS). What level of certainty does this system achieve, and on what grounds?
Let us suppose that we are comparing a crime-scene sample with the DNA of a suspect using 13 loci (see table 1.2). For each STR locus, forensic technicians determine the number of repeats for each of the two alleles. For example, suppose that at locus A (table 1.2) both the suspect and the crime sample have 10 and 6 repeats. The next logical question is: how many people in the population of the suspect (racial, ethnic, and/or ancestry group) have 10 and 6 repeats at that locus? At locus B we find that there are 12 and 8 repeats for both the crime-scene sample and the suspect. Again, how many people in the suspect’s reference population have 12 and 8 repeats for locus B? The reliability of the statistics for estimating that a random person in the population has the same repeats (random-match probability) is based on the frequency of the number of repeats at a locus in various reference populations. This information is critical for establishing the likelihood that two random people have the same number of STRs for 13 loci. The chance that two people have identical repeats is increased the closer they are in their genetic lineage. Close family members are more likely than those who are not related to have the same number of repeats in many (but not all) loci.
Genetic Justice Page 3