L.A. Math: Romance, Crime, and Mathematics in the City of Angels

Page 22

by James D. Stein

In five-card stud, each player is initially dealt one card face down, which only the player can see, and one card face up, which everyone can see. A round of betting ensues. Then each player is dealt a card face up, and another round of betting ensues. This continues either until all but one player has “folded” (refused to match a bet made by another player) or until the remaining players have a total of five cards, one face down and four face up.

Suppose that in a game of five-card stud, you are dealt the ace of clubs face down, and the ace of hearts and the four, five, and six of diamonds face up—a pair of aces. Doc, your opponent, has been dealt the five, seven, nine, and queen of spades face up. You quickly observe that Doc can only win if he has a spade face down to give him a flush, and you also quickly calculate that there are 43 unseen cards, of which 9 are spades. Based on only this information, Doc’s chances of winning are therefore 9/43, and yours are a healthy 1 − 9/43 = 34/43.

Doc has, however, made a serious mistake: he is sitting with his back to a large mirror, and as he glances at his down card you can see a flash in the mirror—not enough to know for certain what that card is, but enough to convey some information to you. Let’s look at three different cases.

Case 1: You catch a glimpse of black. Uh-oh. Doc’s card must now be one of 21 unseen black cards (you have seen your ace of clubs and Doc’s four spades of the 26 black cards in the deck). Since nine of them are spades, Doc’s chances of winning have increased to 9/21 = 3/7, and yours have decreased to 1 − 3/7 = 4/7.

Case 2: You see a flash of red. Doc’s card must now be one of 22 unseen red cards (you have seen your ace of hearts and four, five, and six of diamonds), and no red cards can be spades! Doc’s chances of winning are 0/22 = 0, so you are certain to win! Gleefully, you watch Doc finger his chips, hoping he’ll try to bluff you out of the pot.

Case 3: You see the markings indicating a face card. Doc’s card must now be one of 11 unseen face cards (you have seen Doc’s queen of spades), two of which are spade face cards (the jack or the king). Doc’s chances of winning are therefore 2/11, and yours are 1 − 2/11 = 9/11.

Each of these situations constitutes a problem in conditional probability. In the original situation, you had no information on the nature of Doc’s card. The sample space S for the experiment was therefore all forty-three unseen cards in the deck, and the event D (Doc has a winning hand) consisted of all nine unseen spades. Since this is a uniform probability space, P(D) = N(D)/N(S) = 9/43.

In each of the above three cases, information came to you that changed the sample space for the experiment. The new sample space was a subset of the original sample space S. The event that Doc wins was also altered by the outcomes available in the new sample space. In case 3, for instance, the new sample space was F, the set of all unseen face cards, and the event that Doc wins consisted of all spades that were also unseen face cards. The eight of spades, which was a winning card for Doc in the original situation, was no longer a winning card in light of the fact that it could not belong to F, the revised sample space.

THE PROBABILITY OF A GIVEN B

Let’s look at this from a more general standpoint. Suppose that there are two events A and B in a sample space S. We now define P(A | B) to be the probability of the event A, given that the event B has already occurred (the symbol A | B is read “A given B”). The conditional probability P(A | B) is given by the following formula.

Notice that this definition gives the same results as the computations we have already done in the three cases discussed in the poker problem. For instance, in case 3, if we let D be the event that Doc wins (Doc’s unseen card is a spade) and B be the event that Doc’s unseen card is a face card, then D ∩ B is the event that Doc’s unseen card is a spade face card. We had already computed above that P(D | B) = 2/11. Since N(B) = 11 and N(D ∩ B) = 2, in the original sample space S (no information about the unseen card), we see that P(B) = 11/43 and P(D ∩ B) = 2/43. Therefore, the computational rule P(D | B) = P(D ∩ B)/P(B) = (2/43)/(11/43) = 2/11.

APPENDIX 11

STATISTICS IN “THE GREAT BASKETBALL FIX”

LIES, DAMNED LIES, AND STATISTICS

You’re constantly exposed to statistics. On a typical day, you might receive statistical information from the stock market (the Dow Jones averages), the entertainment industry (the Nielsen ratings), sports (baseball batting averages), economic reports (cost of living indexes), and a multitude of other areas. Everybody uses statistics to make a point, generally the point they want to make.

As a result, statistics could probably use some good PR because many people feel that statistics are used to promote a particular point of view and cover up the truth. That viewpoint is expressed in the sentiment so brilliantly expressed by Disraeli: “There are three kinds of lies: lies, damned lies, and statistics.”

Statistics are frequently used to summarize data so that the data become more useful. In a world increasingly devoted to collecting and processing data, you can be overwhelmed by the data tsunami. It is not possible to understand a mass of data in raw form, not because it is intrinsically incomprehensible, but simply because there is so much of it. To understand the results of the 2012 U.S. presidential election, nobody wants to know how all 100 million voters, as individuals, voted, or even the vote totals for Obama and Romney. You want to know the percentage of voters who voted for each candidate.

These percentages constitute one of the fundamental tools of statistics: the probability distribution. Recall that probability had both an empirical and a predictive aspect, and the same can be said of statistics. The two basic problems of statistics are how to summarize data and how statistics can be used to make intelligent decisions.

RANDOM VARIABLES, DISTRIBUTIONS, AND GRAPHS

Throughout this section, S is the sample space of an experiment whose possible outcomes are O1, O2, …, On. During a basketball game one evening, LeBron James made eight two-point field goal attempts and missed seven, and made two three-point field goal attempts and missed three. The sample space of this experiment is an attempted field goal by LeBron James, and the possible outcomes are

O1 = made a 2-point field goal

O2 = missed a 2-point field goal

O3 = made a 3-point field goal

O4 = missed a 3-point field goal

This information can be obtained from the box score in the paper the next day. It can be summarized in terms of a function X whose domain consists of the numbers 1, 2, 3, and 4 (the possible outcomes) and whose values are given by

The function X defined here, which is associated with the sample space S, is called a random variable. A random variable is simply a function whose domain (the allowable inputs) are outcomes from a sample space and whose range (the outputs of the function) are numbers. It is customary to use capital letters at the end of the alphabet, such as X, Y, and Z, to denote random variables.

When the range of a random variable consists of nonnegative whole numbers, the term frequency distribution is used to describe the random variable. The random variable X defined above is a frequency distribution. X(1) is the frequency of two-point field goals LeBron James made, X(2) is the frequency of two-point field goals LeBron James missed, etc.

In the basketball game above, X(1) + X(2) + X(3) + X(4) = 8 + 7 + 2 + 3 = 20, which is the number of field goals LeBron James attempted. Define Y(1) = X(1)/20 = 8/20 = 0.4; then a probabilistic interpretation of Y(1) is that, if you choose a LeBron James field goal attempt at random, that attempt would be a successful two-point field goal with probability 0.4. (The percentage interpretation would be that 40% of LeBron James’s field goal attempts were successful two-pointers.) Similarly, if Y(2) = X(2)/20 = 0.35, Y(3) = X(3)/20 = 0.1, and Y(4) = X(4)/20 = 0.15, then Y(1) + Y(2) + Y(3) + Y(4) = 0.4 + 0.35 + 0.1 + 0.15 = 1. The function Y is simultaneously a random variable and a probability function. A random variable that is also a probability function is called a probability distribution.

Let X be the random variable associat
ed with LeBron James’s field goal attempts. By borrowing the letter P from probability, you can discuss the various probabilities associated with LeBron James’s field goal attempts without having to introduce another letter (such as Y). The notation

P(X = 1) = 0.4

is read “the probability that the random variable X assumes the value 1 is 0.4.” This notation is commonly used to describe probability distributions associated with frequency distributions.

Example 1: Rutabaga Biotech has twenty-seven employees whose salaries are less than $40,000 a year, sixteen employees whose salaries are between $40,000 and $80,000 a year, and seven employees whose salaries are more than $80,000 a year. Describe the sample space, frequency distribution, and probability distribution associated with this experiment.

Solution: O1 = an employee has a salary of less than $40,000 a year, O2 = an employee has a salary of between $40,000 and $80,000 a year, and O3 = an employee has a salary of more than $80,000 a year. The frequency distribution is the random variable X such that

Since 27 + 16 + 7 = 50, the probability distribution associated with this random variable is P(X = 1) = 27/50 = 0.54, P(X = 2) = 16/50 = 0.32, and P(X = 3) = 7/50 = 0.14. ■

In example 1, there’s a choice of how to describe the outcomes of the sample space. One possibility was to describe the possible outcomes as O1 = a salary of $1/year, O2 = a salary of $2/year, etc., up through the maximum salary that an employee at Rutabaga Biotech makes. This has the obvious inconvenience of having an experiment with more than 80,000 different outcomes. Alternatively, only the actual salaries could have been used as possible outcomes, which would limit the sample space to 50 outcomes if everyone had a different salary. The actual procedure selected, using a range of possible values as a particular outcome, is known as binning (shorthand for “to place in a bin”). In example 1, a judicious choice of bins has made it relatively clear what the salary structure at Rutabaga Biotech is, using a sample space with only three different outcomes.

MEASURES OF CENTRAL TENDENCY AND DISPERSION

During the height of a political campaign, one is deluged with “sound bites,” those little morsels that condense an extremely complex position into one memorable phrase or slogan. Because frequency distributions can also be complex, there is substantial interest in finding numbers that can serve as “data bites,” compressing much of the information of the distribution into a very few quantities. There are two basic types of “data bites”: measures of central tendency, which locate the middle of the distribution, and measures of dispersion, which tell how tightly packed the distribution is around its middle.

The Mean: A Measure of Central Tendency

The mean of a distribution is just our old friend, the average. If you are given numbers X1, …, Xn, the mean, denoted by μ (the Greek letter mu), is simply

Example 2: In six consecutive home games, the Chicago Cubs pitching staff allowed 4, 8, 9, 6, 6, and 9 earned runs (the wind was blowing out in Wrigley Field). Calculate the mean of this distribution.

Solution: The mean μ = (4 + 8 + 9 + 6 + 6 + 9)/6 = 7. ■

The Standard Deviation: A Measure of Dispersion

The standard deviation, which is denoted by σ (the Greek lower-case sigma), is defined according to the equation

This rather imposing looking formula is calculated by the following procedure.

Step 1: Find the mean.

Step 2: Compute the sum of the squares of all the deviations.

Step 3: Divide the result of step 2 by one less than the number of data points.

Step 4: σ is the square root of the result of step 3.

This really isn’t a problem any longer since computers can calculate standard deviations easily (it’s built into most spreadsheets, such as Excel), and it is not much bother with a hand calculator. In fact, many handheld calculators are programmed to find the standard deviation of a data set. You simply key in the data points, and the result appears.

Example 3: What is the standard deviation of the number of runs in example 2?

Solution: The mean was already computed to be 7, and the deviations to be −3, −1 (twice), 1, and 2 (twice). The squares of the deviations are 9, 1 (three times), and 4 (twice). The sum of the squares of the deviations is 20. Since there are 6 data points in the sample, divide by 6 − 1 = 5, obtaining 4. Finally, take the square root of this number, obtaining σ = 2.00. And you didn’t even need a calculator for this one!

Although the standard deviation appears unnaturally complicated, it has the great advantage that predictions can be made from it.

THE NORMAL DISTRIBUTION

When Pete sent Freddy to report on how many free throws Theresa Middlebury made out of 100, he knew that Freddy would report that Theresa would have made a whole number of free throws. She would not have made 72.1 or 86.437 free throws. The number of free throws Theresa made out of 100 is an example of a discrete random variable, a random variable that can only assume a fixed, finite number of values (in Theresa’s case, between 0 and 100).

In contrast, had Pete sent Freddy to measure Theresa’s height, any value would conceivably have been possible, such as 64.18 or 67.304 inches, assuming that Freddy had a sufficiently finely calibrated ruler. Admittedly, you normally measure height to within the nearest inch or half inch, but this is by choice—it is certainly within our capability to measure more accurately. A random variable that can assume any value (within a given range) is said to be continuous.

Many continuous random variables have a distribution shaped like the one in figure 11.1, which is a hypothetical distribution of the heights of basketball players. The curve is known as a probability density function. It has the property that the total area under the curve is 1. The probability that a randomly selected basketball player, from a distribution with a mean of 78 inches and a standard deviation of 4 inches, has a height between 74 and 81 inches is the total amount of area shaded in figure 11.1. This could also be interpreted as the fraction of basketball players in the distribution between 74 and 81.

The bell-shaped curve in figure 11.1 is a special type of distribution known as a normal distribution. Normal distributions are extremely important because not only are they characteristic of many continuous random variables, but they also provide excellent approximations to many discrete random variables (Pete talked about this in the story).

Figure 11.1

There can be many different distributions with mean μ and standard deviation σ. However, there is only one normal distribution with mean μ and standard deviation σ. Consequently, when you know the mean and standard deviation of a normal distribution, you know precisely which bell-shaped curve to draw, since there is only one. Once the curve is drawn, you can answer any question about the distribution.

BINOMIAL DISTRIBUTIONS

(Binomial distribution continued from p. 102)

In the story in chapter 11, you learned that Theresa Middlebury was an 80% foul shooter (reasonably good, even by NBA standards!). What is the probability that she would make precisely one out of two free throws?

When analyzing this problem, it is necessary to assume that each free throw is independent—no matter whether she hits or misses the first, her probability of sinking the second will still be the same: 0.8. When two events are independent, recall that the probability of both events occurring is obtained by multiplying the probability that each event will occur. The probability that she will make the first shot and miss the second is 0.8 × 0.2 = 0.16. Similarly, the probability that she will miss the first shot and make the second is 0.2 × 0.8 = 0.16. So the probability that she will make precisely one out of two is 2 × 0.16 = 0.32.

Now let’s consider the probability of Theresa making exactly eighty out of a hundred free throws. One way she could do this is to make the first eighty and miss the last twenty. Since these free throws are all independent, the probability is obtained by multiplying 0.8 × … × 0.8 × 0.2 × … × 0.2 = 0.880 × 0.220. Another way that she could make exactly eig
hty out of a hundred free throws would be to miss the first twenty and then make the remaining eighty. The probability of this happening is 0.2 × … × 0.2 × 0.8 × … × 0.8 = 0.220 × 0.880 = 0.880 × 0.220. A third way this could happen would be for her to miss the first ten, make eighty in a row, and then miss the last ten. The probability of this happening is 0.2 × … × 0.2 × 0.8 × … × 0.8 × 0.2 × … × 0.2 = 0.210 × 0.880 × 0.210 = 0.880 × 0.220. No matter in which specific order you arrange for Theresa to make eighty and miss twenty, the probability that she will shoot the free throws in that exact order is 0.880 × 0.220.

To compute the probability that Theresa will make precisely eighty out of a hundred free throws, you must therefore add 0.880 × 0.220 once for each of the different orders in which she could make eighty and miss twenty. How many different such orders are there? If you imagine that you have a hundred numbered balls in a jar, and that you choose eighty of them, you could simply decree that Theresa makes the free throws whose numbers are on the balls that have been chosen and misses the others. Therefore, if you choose the eighty balls numbered 11 through 90, Theresa would miss the first ten, make shots 11 through 90, and miss the last ten.

The number of ways of choosing eighty numbered balls from a hundred is 100!/(80! × 20!) [factorials work this way: n! = 1 × 2 × 3 × … × n]; the reasoning behind this calculation can be found at press.princeton.edu/titles/10559/html. Therefore, Theresa’s probability of making exactly eighty of a hundred free throws is (100!/(80! × 20!)) × 0.880 × 0.220, which is approximately 0.0993.

There are really only three essential numbers in the above formula, as the other numbers can be computed from knowing these three. The first number is 100, which represents the number of free throws Theresa shoots. The next number is 80, which represents the number of free throws Theresa makes. The third number is 0.8, which represents the probability of Theresa making a free throw. The number 0.2, which is the probability of Theresa missing a free throw, is 1 − 0.8, and the number 20, which is the number of free throws Theresa misses, is 100 − 80.

‹ Prev Next ›