by Mario Livio
Let’s examine one of the fascinating examples discussed by Pascal in a letter dated July 29, 1654. Imagine two noblemen engaged in a game involving the roll of a single die. Each player has put on the table thirty-two pistoles of gold. The first player chose the number 1, and the second chose the number 5. Each time the chosen number of one of the players turns up, that player gets one point. The winner is the first one to have three points. Suppose, however, that after the game has been played for some time, the number 1 has turned up twice (so that the player who had chosen that number has two points), while the number 5 has turned up only once (so the opponent has only one point). If, for whatever reason, the game has to be interrupted at that point, how should the sixty-four pistoles on the table be divided between the two players? Pascal and Fermat found the mathematically logical answer. If the player with two points were to win the next roll, the sixty-four pistoles would belong to him. If the other player were to win the next roll, each player would have had two points, and so each would have gotten thirty-two pistoles. Therefore, if the players separate without playing the next roll, the first player could correctly argue: “I am certain of thirty-two pistoles even if I lose this roll, and as for the other thirty-two pistoles perhaps I shall have them and perhaps you will have them; the chances are equal. Let us then divide these thirty-two pistoles equally and give me also the thirty-two pistoles of which I am certain.” In other words, the first player should get forty-eight pistoles and the other sixteen pistoles. Unbelievable, isn’t it, that a new, deep mathematical discipline could have emerged from this type of apparently trivial discussion? This is, however, precisely the reason why the effectiveness of mathematics is as “unreasonable” and mysterious as it is.
The essence of probability theory can be gleaned from the following simple facts. No one can predict with certainty which face a fair coin tossed into the air will show once it lands. Even if the coin has just come up heads ten times in a row, this does not improve our ability to predict with certainty the next toss by one iota. Yet we can predict with certainty that if you toss that coin ten million times, very close to half the tosses will show heads and very close to half will show tails. In fact, at the end of the nineteenth century, the statistician Karl Pearson had the patience to toss a coin 24,000 times. He obtained heads in 12,012 of the tosses. This is, in some sense, what probability theory is really all about. Probability theory provides us with accurate information about the collection of the results of a large number of experiments; it can never predict the result of any specific experiment. If an experiment can produce n possible outcomes, each one having the same chance of occurring, then the probability for each outcome is 1/n. If you roll a fair die, the probability of obtaining the number 4 is 1/6, because the die has six faces, and each face is an equally likely outcome. Suppose you rolled the die seven times in a row and each time you got a 4, what would be the probability of getting a 4 in the next throw? Probability theory gives a crystal-clear answer: The probability would still be 1/6—the die has no memory and any notions of a “hot hand” or of the next roll making up for the previous imbalance are only myths. What is true is that if you were to roll the die a million times, the results will average out and 4 would appear very close to one-sixth of the time.
Let’s examine a slightly more complex situation. Suppose you simultaneously toss three coins. What is the probability of getting two tails and one head? We can find the answer simply by listing all the possible outcomes. If we denote heads by “H” and tails by “T,” then there are eight possible outcomes: TTT, TTH, THT, THH, HTT, HTH, HHT, HHH. Of these, you can check that three are favorable to the event “two tails and one head.” Therefore, the probability for this event is 3/8. Or more generally, if out of n outcomes of equal chances, m are favorable to the event you are interested in, then the probability for that event to happen is m/n. Note that this means that the probability always takes a value between zero and one. If the event you are interested in is in fact impossible, then m = 0 (no outcome is favorable) and the probability would be zero. If, on the other hand, the event is absolutely certain, that means that all n events are favorable (m = n) and the probability is then simply n/n = 1. The results of the three coin tosses demonstrate yet another important result of probability theory—if you have several events that are entirely independent of each other, then the probability of all of them happening is the product of the individual probabilities. For instance, the probability of obtaining three heads is 1/8, which is the product of the three probabilities of obtaining heads in each of the three coins: 1/2 × 1/2 × 1/2 = 1/8.
OK, you may think, but other than in casino games and other gambling activities, what additional uses can we make of these very basic probability concepts? Believe it or not, these seemingly insignificant probability laws are at the heart of the modern study of genetics—the science of the inheritance of biological characteristics.
The person who brought probability into genetics was a Moravian priest. Gregor Mendel (1822–84) was born in a village near the border between Moravia and Silesia (today Hyncice in the Czech Republic). After entering the Augustinian Abbey of St. Thomas in Brno, he studied zoology, botany, physics, and chemistry at the University of Vienna. Upon returning to Brno, he began an active experimentation with pea plants, with strong support from the abbot of the Augustinian monastery. Mendel focused his research on pea plants because they were easy to grow, and also because they have both male and female reproductive organs. Consequently, pea plants can be either self-pollinated or cross-pollinated with another plant. By cross-pollinating plants that produce only green seeds with plants that produce only yellow seeds, Mendel obtained results that at first glance appeared to be very puzzling (figure 34). The first offspring generation had only yellow seeds. However, the following generation consistently had a 3:1 ratio of yellow to green seeds! From these surprising findings, Mendel was able to distill three conclusions that became important milestones in genetics:
The inheritance of a characteristic involves the transmittance of certain “factors” (what we call genes today) from parents to offspring.
Every offspring inherits one such “factor” from each parent (for any given trait).
A given characteristic may not manifest itself in an offspring but it can still be passed on to the following generation.
But how can one explain the quantitative results in Mendel’s experiment? Mendel argued that each of the parent plants must have had two identical “factors” (what we would call alleles, varieties of a gene), either two yellow or two green (as in figure 35). When the two were mated, each offspring inherited two different alleles, one from each parent (according to rule 2 above). That is, each offspring seed contained a yellow allele and a green allele. Why then were the peas of this generation all yellow? Because, Mendel explained, yellow was the dominant color and it masked the presence of the green allele in this generation (rule 3 above). However (still according to rule 3), the dominant yellow did not prevent the recessive green from being passed on to the next generation. In the next mating round, each plant containing one yellow allele and one green allele was pollinated with another plant containing the same combination of alleles. Since the offspring contain one allele from each parent, the seeds of the next generation may contain one of the following combinations (figure 35): green-green, green-yellow, yellow-green, or yellow-yellow. All the seeds with a yellow allele become yellow peas, because yellow is dominant. Therefore, since all the allele combinations are equally likely, the ratio of yellow to green peas should be 3:1.
Figure 34
Figure 35
You may have noticed that the entire Mendel exercise is essentially identical to the experiment of tossing two coins. Assigning heads to green and tails to yellow and asking what fraction of the peas would be yellow (given that yellow is dominant in determining the color) is precisely the same as asking what is the probability of obtaining at least one tails in tossing two coins. Clearly that is 3/4, since th
ree of the possible outcomes (tails-tails, tails-heads, heads-tails, heads-heads) contain a tails. This means that the ratio of the number of tosses that do contain at least one tails to the number of tosses that do not should be (in the long run) 3:1, just as in Mendel’s experiments.
In spite of the fact that Mendel published his paper “Experiments on Plant Hybridization” in 1865 (and he also presented the results at two scientific meetings), his work went largely unnoticed until it was rediscovered at the beginning of the twentieth century. While some questions related to the accuracy of his results have been raised, he is still regarded as the first to have laid the mathematical foundations of modern genetics. Following in the path cleared by Mendel, the influential British statistician Ronald Aylmer Fisher (1890–1962) established the field of population genetics—the mathematical branch that centers on modeling the distribution of genes within a population and on calculating how gene frequencies change over time. Today’s geneticists can use statistical samplings in combination with DNA studies to forecast probable characteristics of unborn offspring. But still, how exactly are probability and statistics related?
Facts and Forecasts
Scientists who try to decipher the evolution of the universe usually try to attack the problem from both ends. There are those who start from the tiniest fluctuations in the cosmic fabric in the primordial universe, and there are those who study every detail in the current state of the universe. The former use large computer simulations in an attempt to evolve the universe forward. The latter engage in the detective-style work of trying to deduce the universe’s past from a multitude of facts about its present state. Probability theory and statistics are related in a similar fashion. In probability theory the variables and the initial state are known, and the goal is to predict the most likely end result. In statistics the outcome is known, but the past causes are uncertain.
Let’s examine a simple example of how the two fields supplement each other and meet, so to speak, in the middle. We can start from the fact that statistical studies show that the measurements of a large variety of physical quantities and even of many human characteristics are distributed according to the normal frequency curve. More precisely, the normal curve is not a single curve, but rather a family of curves, all describable by the same general function, and all being fully characterized by just two mathematical quantities. The first of these quantities—the mean—is the central value about which the distribution is symmetric. The actual value of the mean depends, of course, on the type of variable being measured (e.g., weight, height, or IQ). Even for the same variable, the mean may be different for different populations. For instance, the mean of the heights of men in Sweden is probably different from the mean of the heights of men in Peru. The second quantity that defines the normal curve is known as the standard deviation. This is a measure of how closely the data are clustered around the mean value. In figure 36, the normal curve (a) has the largest standard deviation, because the values are more widely dispersed. Here, however, comes an interesting fact. By using integral calculus to calculate areas under the curve, one can prove mathematically that irrespective of the values of the mean or the standard deviation, 68.2 percent of the data lie within the values encompassed by one standard deviation on either side of the mean (as in figure 37). In other words, if the mean IQ of a certain (large) population is 100, and the standard deviation is 15, then 68.2 percent of the people in that population have IQ values between 85 and 115. Furthermore, for all the normal frequency curves, 95.4 percent of all the cases lie within two standard deviations of the mean, and 99.7 percent of the data lie within three standard deviations on either side of the mean (figure 37). This implies that in the above example, 95.4 percent of the population have IQ values between 70 and 130, and 99.7 percent have values between 55 and 145.
Figure 36
Suppose now that we want to predict what the probability would be for a person chosen at random from that population to have an IQ value between 85 and 100. Figure 37 tells us that the probability would be 0.341 (or 34.1 percent), since according to the laws of probability, the probability is simply the number of favorable outcomes divided by the total number of possibilities. Or we could be interested in finding out what the probability is for someone (chosen at random) to have an IQ value higher than 130 in that population. A glance at figure 37 reveals that the probability is only about 0.022, or 2.2 percent. Much in the same way, using the properties of the normal distribution and the tool of integral calculus (to calculate areas), one can calculate the probability of the IQ value being in any given range. In other words, probability theory and its complementary helpmate, statistics, combine to give us the answer.
Figure 37
Figure 38
As I have noted several times already, probability and statistics become meaningful when one deals with a large number of events—never individual events. This cardinal realization, known as the law of large numbers, is due to Jakob Bernoulli, who formulated it as a theorem in his book Ars Conjectandi (The Art of Conjecturing; figure 38 shows the frontispiece). In simple terms, the theorem states that if the probability of an event’s occurrence is p, then p is the most probable proportion of the event’s occurrences to the total number of trials. In addition, as the number of trials approaches infinity, the proportion of successes becomes p with certainty. Here is how Bernoulli introduced the law of large numbers in Ars Conjectandi: “What is still to be investigated is whether by increasing the number of observations we thereby also keep increasing the probability that the recorded proportion of favorable to unfavorable instances will approach the true ratio, so that this probability will finally exceed any desired degree of certainty.” He then proceeded to explain the concept with a specific example:
We have a jar containing 3000 small white pebbles and 2000 black ones, and we wish to determine empirically the ratio of white pebbles to the black—something we do not know—by drawing one pebble after another out of the jar, and recording how often a white pebble is drawn and how often a black. (I remind you that an important requirement of this process is that you put back each pebble, after noting the color, before drawing the next one, so that the number of pebbles in the urn remains constant.) Now we ask, is it possible by indefinitely extending the trials to make it 10, 100, 1000, etc., times more probable (and ultimately “morally certain”) that the ratio of the number of drawings of a white pebble to the number of drawings of a black pebble will take on the same value (3:2) as the actual ratio of white to black pebbles in the urn, than that the ratio of the drawings will take on a different value? If the answer is no, then I admit that we are likely to fail in the attempt to ascertain the number of instances of each case (i.e., the number of white and of black pebbles) by observation. But if it is true that we can finally attain moral certainty by this method [and Jakob Bernoulli proves this to be the case in the following chapter of Ars Conjectandi]…then we can determine the number of instances a posteriori with almost as great accuracy as if they were known to us a priori.
Bernoulli devoted twenty years to the perfection of this theorem, which has since become one of the central pillars of statistics. He concluded with his belief in the ultimate existence of governing laws, even in those instances that appear to be a matter of chance:
If all events from now through eternity were continually observed (whereby probability would ultimately become certainty), it would be found that everything in the world occurs for definite reasons and in definite conformity with law, and that hence we are constrained, even for things that may seem quite accidental, to assume a certain necessity and, as it were, fatefulness. For all I know that is what Plato had in mind when, in the doctrine of the universal cycle, he maintained that after the passage of countless centuries everything would return to its original state.
The upshot of this tale of the science of uncertainty is very simple: Mathematics is applicable in some ways even in the less “scientific” areas of our lives—including those that appe
ar to be governed by pure chance. So in attempting to explain the “unreasonable effectiveness” of mathematics we cannot limit our discussion only to the laws of physics. Rather, we will eventually have to somehow figure out what it is that makes mathematics so omnipresent.
The incredible powers of mathematics were not lost on the famous playwright and essayist George Bernard Shaw (1856–1950). Definitely not known for his mathematical talents, Shaw once wrote an insightful article about statistics and probability entitled “The Vice of Gambling and the Virtue of Insurance.” In this article, Shaw admits that to him insurance is “founded on facts that are inexplicable and risks that are calculable only by professional mathematicians.” Yet he offers the following perceptive observation:
Imagine then a business talk between a merchant greedy for foreign trade but desperately afraid of being shipwrecked or eaten by savages, and a skipper greedy for cargo and passengers. The captain answers the merchant that his goods will be perfectly safe, and himself equally so if he accompanies them. But the merchant, with his head full of the adventures of Jonah, St. Paul, Odysseus, and Robinson Crusoe, dares not venture. Their conversation will be like this:
Captain: Come! I will bet you umpteen pounds that if you sail with me you will be alive and well this day a year.