Thus, it may look as though a hitter who knocks out a string of home runs and multihit games is in a hot streak, or that a hitter who strikes out repeatedly while taking a succession of oh-for-fours is in a slump, but how would it look if these stretches were simply the result of random, independent events and chance?
THE MOST COMMON approach to computing the probability of a hitting streak involves the application of two statistics: 1) the frequency with which a player gets a hit per time he comes to the plate and 2) the average number of times he comes to the plate each game. The studies that have relied on a player’s batting average to determine that first statistic, hit-frequency, are clearly in error. Batting average does not take into account unofficial at bats such as walks and sacrifice hits, even though each of those at bats is in fact an opportunity to get a hit. In many real-life game situations a walk is as good as a hit, as Little League coaches like to say. In determining a player’s batting average a walk is as good as a nothing. But in figuring the likelihood of getting a hit in a given game, a walk is as good as an out. It is a missed opportunity.
Joe DiMaggio’s career batting average was .325. He had 2,214 career hits in 6,821 official at bats. But when you divide those hits by his career plate appearances (7,671), you get .288. (That is, he got a hit 28.8% of the time came to the plate.) I’ll call this his career streak average. More relevant to determining DiMaggio’s likelihood of putting together a long streak in 1941 is to look at his statistics from ’36 through the end of the ’40 season. In that five-year stretch his batting average was .343, his streak average .311.
The accounting for walks (and other unofficial plate appearances) is simple but crucial. Some fans wonder why DiMaggio’s peer Ted Williams, whose career .344 batting average is the seventh highest alltime—and much higher than DiMaggio’s—never put together a streak of even 25 games. Williams was actually a much weaker candidate for a hitting streak than DiMaggio was because Williams walked so frequently. With 2,654 hits in 9,791 career plate appearances, Williams’s streak average is just .271. Even in 1941 when Williams batted .406 to DiMaggio’s .357, Williams’s streak average was just .305 to DiMaggio’s .311. Scores of players are or were better hitting streak candidates than the immortal Williams, including lesser lights such as current Phillies infielder Placido Polanco with a career batting average of just .303, but a streak average of .277.
Let’s do this very briefly. Once you have established a player’s streak average and his average plate appearances per game (for DiMaggio that number was 4.54 from 1936 up to the start of the 1941 season) you can figure out the likelihood that he will get one or more hits in a game by using a basic formula of probability theory. In DiMaggio’s case that likelihood was a hair below 82%—extremely high although hardly unprecedented. To figure the likelihood that he will get a hit in two specific consecutive games, you simply multiply .82 by itself and get about .67 or 67%. Multiply this by .82 again and you get 55% as the probability of a hit in three specific consecutive games and so on. Working it out this way the chances for a 10-game streak are 13.74%, for a 20-gamer 1.89%, for a 30-gamer .26%, for a 40-gamer .0357% and for 56 straight .0015%.
But that’s clearly not our answer. The above formula is for a specific 56-game stretch. What we want to know is the likelihood of a given streak at any time over the course of a season, that is, in any of the available 56-game stretches, not a particular one. A batter could hit in games 1 through 56, or games 2 through 57, or games 3 through 58 and so on. There are 99 possible ways to hit in 56 straight over a 154-game season. Many of those sequences are overlapping. Now, figuring the probability becomes more complicated. In a 2002 paper published in the Baseball Research Journal, Michael Freiman employed a recursive algebraic formula to resolve this issue and ultimately came up with .01%, or one in 10,000, as DiMaggio’s chances for hitting in 56 straight at some point in the 1941 season.1
Some other studies have used a method like Freiman’s, but lately what a lot of people do is simply plug the .82 probability into a computer program and have it run millions of simulated 154-game sequences. Using this approach my neighborhood math genius—the masterly Ben McGill, who by day does quantitative analysis for JPMorgan Chase—determined the likelihood of a 10-game streak as being a robust 99.36% for a batter as efficient as DiMaggio was in 1941. For 20 games it was 37.34%, for 30 games it was 5.28%, and for 40 games just .65%. To hit in 56 in a row? McGill came back with .0231%, or a little more than once every 5,000 seasons. That number will vary depending on the assumptions that you make at the outset of doing the calculations.2 As mentioned, highly skilled mathematicians have used slightly different numbers and gotten a variety of results. Again a fair analogy is to coin flipping. In effect, whether by algebraic formula or computer simulation, you are figuring out a player’s hitting-streak probability the same way that you would determine the potential for a certain run of heads in a 154-flip trial. Only in this case the coin is weighted to land on heads about 82% of the time.
I won’t go any further in discussing probability theory and I won’t attempt to lay out any notation for a number of reasons, the main one being that I am likely to get it wrong. I am, simply put, out of my league. My limited understanding of probability theory—and by limited I mean limited—has been hard-won over the course of researching this book. You can learn the basics of this material without much math training but it is not easy and it requires a commitment to some dense reading. While I was groping my way through some probability formulas I sometimes thought of Jim Harrison, the marvelous writer and eater who in the name of journalistic enterprise once ate his way through a Rabelaisian feast that began with the consumption of an entire gross of oysters. Of the experience, Harrison said: “It is not recommended.” Neither is a crash course in probability theory.
I did undertake a series of simple probability tests during my learning. In the process I flipped a lot of coins and rolled a lot of dice. I produced a deck of cards and enlisted family members to help me test some popular theories of chance. The intention behind these exercises was not to break new ground, of course, but rather just to spend some time thinking about the kinds of things that people think about when they think about this stuff. As luck would have it, though, one experiment yielded a nice illustration of an important point about probability perception that connects to DiMaggio’s streak.
One evening, my six-year-old daughter and I sat down and took out the Scrabble tiles. We separated out one of each letter of the alphabet, then put those 26 tiles in a soft, black-and-white top hat of the kind that the Cat in the Hat might wear. I asked my daughter to pull out letters one at a time, three pulls in each trial. We did not replace each letter immediately upon looking at it, but after taking the three out we put them all back into the hat, shook it wildly around and began again. My daughter’s first three draws, in precise order, went like this:
E Q K
Y Q B
B A G
“Mommy, I got a word!” she shouted out to my wife. “I got a word on my third turn!” Indeed she had. The chances of drawing, in order, a specific three-letter combination in this scenario is one in 15,600 (that is, 1/26 × 1/25 × 1/24). If we assume that at that point in her development as a reader there were 100 three-letter English words that my daughter would have recognized (excluding those that repeat letters such as mom or boo), then the chances of her pulling one of those words on any turn is 1 in 156; doing it in just her third turn was surely beating the odds. It was not, however, more improbable than the fact that she pulled Q in the second position in each of the first two turns (the odds of that are 1 in 625) or that she pulled consecutive B’s in the second and third turns (1 in 624). It’s just that those events didn’t mean anything to her. Seeing a word you know is exciting. Hauling out a couple of random Q’s, however unlikely that may be, is not.
How does this relate to DiMaggio’s hitting streak? Well, we perceive his hitting in 56 straight games as being the most unlikely thing that a batte
r could do over that span. But in fact whatever the final odds are of that streak, other 56-game combinations are even less probable. Remember that DiMaggio was considerably more likely to get a hit in any given game than not to. Hitting in, say, 49 out of 56 games is of course more probable than hitting in 56 straight if we allow those hitless games to fall anywhere in the series. But if we demand a specific sequence, such as: hit in 19 straight, go hitless for two, hit in 10 straight, go hitless for four, hit in six straight, go hitless for one, and finally hit in 14 more, then the probability becomes even more remote than hitting in 56 straight. That we are amazed by one sequence and not by the other is essentially a matter of perception, not a matter of probability.
My daughter and I did an additional 30 trials that night without getting another word that she knew, and still she regarded her efforts, quite correctly, as a success.
IN 2008 A Cornell graduate student named Sam Arbesman and his adviser, Steven Strogatz, an applied mathematics professor, wrote an op-ed piece for The New York Times that was titled, A Journey to Baseball’s Alternate Universe. The two men were out to test just how improbable DiMaggio’s streak was and, “using a comprehensive collection of baseball statistics from 1871 to 2005,” they wrote, “we simulated the entire history of baseball 10,000 times in a computer.” The key statistic was each player’s probability of getting a hit in a game—such as the 82% for DiMaggio in 1941 that was noted earlier. The computer model was not much different in approach from the many models that preceded it (it too generated information by virtually flipping a series of weighted coins) but it was far more exhaustive. The published results were jarring. “Forty two percent of the simulated baseball histories have a streak of DiMaggio’s length or longer,” Arbesman and Strogatz wrote. “In other words, streaks of 56 games or longer are not at all an unusual occurrence.”
What?
The piece was immediately set upon, its validity questioned by a range of mathematicians, stat geeks and other concerned baseball fans. A major recurring objection was the study’s failure (again, like most of those before it) to account for game-by-game variation. It relied on a distribution of probability that did not necessarily correspond to real events. Surely a hitter might have a better chance of getting a hit on one day, facing pitcher X under Y circumstances, than he does of getting a hit on another day, facing pitcher W under Z circumstances. How then, the naysayers complained, can you accept a study that assigns an unvarying percentage chance of a batter getting hit?
Another objection to Arbesman and Strogatz’s findings, by an inquisitive blogger named Stuart Rojstaczer, pointed out that their model didn’t consider what really happened; it paid no mind to the roughly 130 years of data that’s readily available about major league hitting streaks. The computer simulation revealed its shortcomings by greatly overprojecting the frequency of 20- and 30-game hitting streaks as compared to the number of those streaks that have actually occurred. “An uncalibrated model,” Rojstaczer asserted, “is gibberish.” He charted a simple graph on an X/Y axis, using the real-life frequency of major league hitting streaks (30–34 game streaks occur about every four years; 35–39 gamers about every 12 years etc.) and by extending the graph into the 50s suggested that DiMaggio’s streak would come along once every 1,300 years.
Arbesman and Strogatz were inclined to agree with and appreciate the criticism (Rojstaczer’s work in particular was “brilliant in its simplicity,” the egoless Strogatz told me), so they set out to construct some new, better models. This time they ran simulations that added random variation to the probability of getting a hit in any given game. Rather than the probability being a constant, say, 81% (the number that they derived for DiMaggio), it was 71% in one game, 91% in another and so on. (Remember that Arbesman and Strogatz didn’t just do this for DiMaggio but for all batters, most of whom had a lower hit-probability than DiMaggio, some of whom had a higher one.) These new models produced a somewhat altered probability of a “DiMaggio-like streak”—one set of trials calculated an 18% chance of it having happened since 1905—but the suggested likelihood of long streaks still turned out to be far greater than actual results bear out.
So, how does Strogatz like his streak-calculation models now? “You know, I’m pretty sure that they’re wrong,” he told me over the phone. “Demonstrably wrong. There are things in these simulations, including an artificial steadiness even in our more nuanced models, that are just systematically faulty.”
The central problem with the Arbesman-Strogatz approach is the same one that undermines so many others. Baseball players are not coins, or dice or Scrabble tiles in the bottom of a hat. The number of factors that might influence the likelihood of a hit in a particular game or at bat are simply too many to properly account for. You’d have to address not only each opposing pitcher’s skill but how well rested he is, and, more daunting, his intentions and psychological makeup. You’d have to weigh the positioning of fielders, the softness of the ground, and the temperature and humidity of the air. What about rain or darkness that might shorten a game? And was the batter up at night with a stomach bug? Is his hamstring sore? His finger? His wrist? Is he worried that his girlfriend is ticked off because he never came home on Saturday night? Did a “fan” yell something horribly offensive at him as he stood in the on-deck circle? Is he thinking about the streak or not? Which way is the wind blowing? What is the batter’s general temperament, and even if we could know that then what temperament is best suited to a hitting streak? And so on and so on and so on. Even supposing we could answer these questions, how would we know what impact each circumstance might or might not have on the game? How could that influence possibly be measured?
To say that the probability studies of hitting streaks are naive to psychological factors is a massive understatement. As Robert Remez, a wonderful thinker and professor of psychology at Barnard, said to me, “These experiments are kind of like the stoned guy who is looking for his car keys under the lamppost rather than where he dropped them because that’s where the light is.”
Virtually all attempts to unravel The DiMaggio Enigma use some sort of what’s known as a Bernoulli trial; that is, the coin-flipping model. People deploy Bernoulli trials in studying hitting streaks because it’s a method that “works” to provide an answer that has some logic to it. That’s where the lamppost light is. The problem is this: Instead of flipping the same weighted coin, say, 10,000 times, what you really need to do is flip 10,000 different coins, each of them minutely nuanced, one time each.
Might some other method be better suited to calculating hitting streak probability? Given that the pitcher-batter confrontation is to a large extent a strategic battle—and especially so when a pitcher bears down hoping to stop a streak—it may be that we could learn something about hitting streaks by applying game theory, a method that is often used to predict economic or military behavior. Game theory addresses situations that are more (or entirely) dependent upon strategy rather than upon chance. The approach would lead to questions such as, “If a batter comes up knowing, as the pitcher knows, that he needs to get a hit to extend a streak, is it to his advantage to swing at the first pitch or not?” There are many other relevant questions that might be asked and valuable answers potentially derived. Yet it is difficult to see how any probabilistic conclusions could be drawn from any of them. In the end, game theory is not likely to yield anything more than some peripheral insight into optimal hitting streak behavior.
It’s also conceivable that hitting streaks could be a conundrum suited to Bayesian analysis, or conditional probability, in which new events and information impact and recalibrate probabilistic estimates.3 That is, at bats would not necessarily be treated as independent events, but as being influenced by things that have happened before it. While some of the math pros I spoke with thought that Bayesian analysis might theoretically have some use in looking at streaks, the obstacle, as ever, lies in knowing what conditions to factor in and how to weigh them. It’s easy to see that in pr
actice this method won’t work either.
So, back to the Bernoulli trials. Naturally, the people who have studied the probability of hitting streaks using these trials have considered the uncertainty caused by the almost infinite factors that attend each at bat; in their papers the experimenters typically introduce that point in a hedging paragraph or two. But many are willing to accept this uncertainty and, in effect, dismiss it. To them, those many factors are not truly relevant to the probabilistic measurement. It is like suggesting that rolling a die on a wooden table versus a plastic one, or on a sunny day versus a cloudy one might impact the chances that the die will turn up “3”. Really, the chances are 1 in 6 regardless of the surface or the weather.
“You can try to factor in all those things a batter is going through until you are blue in the face,” said Jim Lackritz an emeritus professor of information and decision systems at San Diego State, whose 1996 paper Two of Baseball’s Great Marks: Can They Ever Be Broken? demonstrated that the odds of someone hitting .400 again are much better than the odds of someone hitting in 56 straight games. “But ultimately it won’t have much impact on the probability of a hitting streak.”
That’s where I disagree. It’s certainly true that the best statistical models can work to establish a fair distribution of probability. And it’s also true that many of the myriad in-game factors may mitigate themselves over a period of time, a full season say, meaning that hazarding a rough probability estimate can make sense. That’s why sophisticated stat processors like Bill James can sometimes do a reasonably good job of predicting player performance and why decades-old, dice-and-card board games such as Strat-O-Matic and APBA can sometimes come fairly close to replicating a player’s statistics over a given year.
56: Joe DiMaggio and the Last Magic Number in Sports Page 33