When the tote board shows a horse has odds of 100, it suggests bettors think its chance of winning is around 1 percent. Yet it seems people are often too generous about a weaker horse’s chances. Statisticians have compared the money people throw at long shots with the amount those horses actually win and have found that the probability of victory is often much lower than the odds imply. Conversely, people tend to underestimate the prospects of the horse that is the favorite to win.
The favorite-long-shot bias means top horses are often more likely to win than their odds suggest. However, betting on them isn’t necessarily a good strategy. Because the track takes a cut in a pari-mutuel system, there is a hefty handicap to overcome. Whereas card counters only have to improve on the Four Horsemen’s method, which almost breaks even, sports bettors need a strategy that will be profitable even when the track charges 19 percent.
The favorite-long-shot bias might be noticeable, but it’s rarely that severe. Nor is it consistent: the bias is larger at some racetracks than at others. Still, it shows that the odds don’t always match a horse’s chances of winning. Like blackjack, the Happy Valley betting market is vulnerable to smart gamblers. And in the 1980s, it became clear that such vulnerability could be extremely profitable.
HONG KONG WASN’T WOODS’S first attempt at a betting system for horse racing. He’d spent 1982 in New Zealand with a group of professional gamblers, hoping that their collective wisdom would be enough to spot horses with incorrect odds. Unfortunately, it was a year of mixed success.
Benter had a background in physics and an interest in computers, so for the races at Happy Valley, the pair planned to employ a more scientific approach. But winning at the racetrack and winning at blackjack involved very different sets of problems. Could mathematics really help predict horse races?
A visit to the University of Nevada’s library brought the answer. In a recent issue of a business journal, Benter spotted an article by Ruth Bolton and Randall Chapman, two researchers based at the University of Alberta in Canada. It was called “Searching for Positive Returns at the Track.” In the opening paragraph, they hinted at what followed over the next twenty pages. “If the public makes systematic and detectable errors in establishing the betting odds,” they wrote, “it may be possible to exploit such a situation with a superior wagering strategy.” Previously published strategies had generally concentrated on well-known discrepancies in racing odds, like the favorite-long-shot bias. Bolton and Chapman had taken a different approach. They’d developed a way to take available information about each horse—such as percentage of races won or average speed—and convert it into an estimate of the probability that horse would win. “It was the paper that launched a multi-billion dollar industry,” Benter said. So, how did it work?
TWO YEARS AFTER HIS work on the roulette wheels of Monte Carlo, Karl Pearson met a gentleman by the name of Francis Galton. A cousin of Charles Darwin, Galton shared the family passion for science, adventure, and sideburns. However, Pearson soon noticed a few differences.
When Darwin developed his theory of evolution, he’d taken time to organize the new field, introducing so much structure and direction that his fingerprints can still be seen today. Whereas Darwin was an architect, Galton was an explorer. Much like Poincaré, Galton was happy to announce a new idea and then wander off in search of another. “He never waited to see who was following him,” Pearson said. “He pointed out the new land to biologist, to anthropologist, to psychologist, to meteorologist, to economist, and left them to follow or not at their leisure.”
Galton also had an interest in statistics. He saw it as a way to understand the biological process of inheritance, a subject that had fascinated him for years. He’d even roped others into studying the topic. In 1875, seven of Galton’s friends received sweet pea seeds, with instructions to plant them and return the seeds from their progeny. Some people received heavy seeds; some light ones. Galton wanted to see how the weights of the parent seeds were related to those of the offspring.
Comparing the different sizes of the seeds, Galton found that the offspring were larger than the parents if the parents were small, and smaller than them if the parents were large. Galton called it “regression towards mediocrity.” He later noticed the same pattern when he looked at the relationship between heights of human parents and children.
Of course, a child’s appearance is the result of several factors. Some of these might be known; others might be hidden. Galton realized it would be impossible to unravel the precise role of each one. But using his new regression analysis, he would be able to see whether some factors contributed more than others. For example, Galton noticed that although parental characteristics were clearly important, sometimes features seemed to skip generations, with characteristics coming from grandparents, or even great-grandparents. Galton believed that each ancestor must contribute some amount to the heritage of a child, so he was delighted when he heard that a horse breeder in Pittsburg, Massachusetts, had published a diagram illustrating the exact process he’d been trying to describe. The breeder, a man by the name of A. J. Meston, used a square to represent the child, and then divided it into smaller squares to show the contribution each ancestor made: the bigger the square, the bigger the contribution. Parents took up half the space; grandparents a quarter; great-grandparents an eighth, and so on. Galton was so impressed with the idea that he wrote a letter to the journal Nature in January 1898 suggesting that they reprint it.
FIGURE 3.2. A. J. Meston’s illustration of inheritance.
Galton spent a good deal of time thinking about how outcomes, such as child size, were influenced by different factors, and he was meticulous about collecting data to support this research. Unfortunately, his limited mathematical background meant he couldn’t take full advantage of the information. When he met Pearson, Galton didn’t know how to calculate precisely how much a change in a particular factor would affect the outcome.
Galton had yet again pointed to a new land, and it was Pearson who filled it with mathematical rigor. The pair soon started to apply the ideas to questions about inheritance. Both viewed regression to the mediocre as a potential problem: they wondered how society could make sure that “superior” racial characteristics were not lost in subsequent generations. In Pearson’s view, a nation could be improved by “insuring that its numbers are substantially recruited from the better stocks.”
From a modern viewpoint, Pearson is a bit of a contradiction. Unlike many of his peers, he thought men and women should be treated as social and intellectual equals. Yet at the same time, he used his statistical methods to argue that certain races were superior to others; he also claimed that laws restricting child labor turned children into social and economic burdens. Today, that’s all rather unsavory. Nevertheless, Pearson’s work has been hugely influential. Not long after Galton’s death in 1911, Pearson established the world’s first statistics department at University College London. Building on the diagram Galton had sent to Nature, Pearson developed a method for “multiple regression”: out of several potentially influential factors, he worked out a way to establish how related each was to a given outcome.
Regression would also provide the backbone for the University of Alberta researchers’ racing predictions. Whereas Galton and Pearson used the technique to examine the characteristics of a child, Bolton and Chapman employed it to understand how different factors affected a horse’s chances of winning. Was weight more important than percentage of recent races won? How did average speed compare with the reputation of the jockey?
Bolton’s first exposure to the world of gambling had come at a young age. “When I was a toddler my Dad took me to the track,” she said, “and apparently my little hand picked the winning horse.” Despite her early success, it was the last time that she went to the races. Two decades later, however, she found herself picking winners once again, this time with a far more robust method.
The idea for a horseracing prediction method had taken shape i
n the late 1970s, while Bolton was a student at Queens University in Canada. Bolton had wanted to learn more about an area of economics known as choice modeling, which aims to capture the benefits and costs of a certain decision. For her final-year dissertation, Bolton teamed up with Chapman, who was researching problems in that area. Chapman, who had a long-standing interest in games, had already accumulated a collection of horse racing data, and together the pair examined how the information could be used to forecast race results. The project was not just the start of an academic partnership; the researchers married in 1981.
Two years after the wedding, Bolton and Chapman submitted the horse racing research to the journal Management Science. At the time, prediction methods were growing in popularity, which meant the work received a lot of scrutiny. “The paper spent a long time in review,” Bolton said. The research eventually went through four rounds of revisions before appearing in print in the summer of 1986.
In their paper, Bolton and Chapman assumed that a particular horse’s chances of winning depended on its quality, which they calculated by bringing together several different measurements. One of these was the starting position. A lower number meant the horse was starting nearer the inside of the track, which should improve a horse’s chances, because it means a shorter distance to run. The pair therefore expected regression analysis to show that an increase in starting number would lead to a decrease in quality.
Another factor was the weight of a horse. It was less clear how this would affect quality. Weight restrictions at some races penalize heavier horses, but faster horses often have a higher weight. Old-school racing pundits might try to come up with opinions about which is more important, but Bolton and Chapman didn’t need to take such views: they could simply let the regression analysis do the hard work and show them how weight was related to quality.
In Bolton and Chapman’s model of a horse race, the quality measurement depended on nine possible factors, including weight, average speed in recent races, and starting position. To illustrate how the different factors contribute to a horse’s quality, it’s tempting to use a setup similar to the diagram Galton sent Nature. However, real life is not as simple as such illustrations suggest. Although Galton’s diagram shows how relatives might shape the characteristics of a child, the picture is incomplete because not everything is inherited. Environmental factors can also influence things, and these might not always be visible or known. Moreover, the neat boxes—for mother, father, and so on—are likely to overlap: if a child’s father has a certain characteristic, the grandfather or grandmother might have it, too. So, you can’t say that each contributing factor is completely independent of the others. The same is true for horse racing. As well as the nine performance-related factors, Bolton and Chapman therefore included an uncertainty factor in their prediction of horse quality. This accounted for unknown influences on horse performance as well as the inevitable quirks of a particular race.
Once the pair had measured the horses’ quality, they converted the measurements into predictions about each animal’s chance of victory. They did this by calculating the total amount of quality across all the horses in the race. The probability a particular horse would win depended on how much the horse contributed to this overall total.
To work out which factors would be useful for making predictions, Bolton and Chapman compared their model to data from two hundred races. Handling the information was a feat in itself, with race results stored on dozens of computer punch cards. “When I got the data, it was in a big box,” Bolton said. “For years, I carried that box around.” Getting the results into the computer was also a challenge: it took about an hour to enter the data for each race.
Of the nine factors Bolton and Chapman tested, the pair found that average speed was the most important in deciding where a horse would finish. In contrast, weight didn’t seem to make any difference to predictions. Either it was irrelevant or any effect it did have was covered by another factor, in a similar way to how a grandfather’s influence on a child’s appearance might be covered by the contribution from the father.
It can be surprising which certain factors turn out to be most important. In an early version of Bill Benter’s model, the number of races a horse had previously run made a big contribution to the predictions. However, there was no intuitive reason why it was so crucial. Some gamblers might try to think up an explanation, but Benter avoided speculating about specific causes. This is because he knew that different factors were likely to overlap. Rather than try to interpret why something like number of races appears to be important, he instead concentrated on putting together a model that could reproduce the observed race results. Just like the gamblers who searched for biased roulette tables, he could obtain a good prediction without pinning down the precise underlying causes.
In other industries, of course, it might be necessary to isolate how much a certain factor affects an outcome. While Galton and Pearson had been studying inheritance, the Guinness brewery had been trying to improve the life span of its stout. The task fell to William Gossett, a promising young statistician who had spent the winter of 1906 working in Pearson’s lab.
Whereas betting syndicates have no control over factors like the weight of a horse, Guinness could alter the ingredients it put in its beer. In 1908, Gossett used regression to see how much hops influenced the drinkable life span of beer. Without hops, the company could expect beer to last between twelve and seventeen days; adding the right amount of hops could increase the life span by several weeks.
Betting teams aren’t particularly interested in knowing why certain factors are important, but they do want to know how good their predictions are. It might seem easiest to test the predictions against the racing data the team had just analyzed. Yet this would be an unwise approach.
Before he worked on chaos theory, Edward Lorenz spent the Second World War as a forecaster for the US Air Corps in the Pacific. One autumn in 1944, his team made a series of perfect predictions about weather conditions on the flight path between Siberia and Guam. At least they were perfect according to the reports from aircraft flying that route. Lorenz soon realized what was causing the incredible success rate. The pilots, busy with other tasks, were just repeating the forecast as the observation.
The same problem appears when syndicates test betting predictions against the data used to calibrate the model. In fact, it would be easy to build a seemingly perfect model. For each racing result, they could include a factor that indicates which horse came in first. Then they could tweak these factors until they fitted perfectly with the horses that actually won each race. It would look like they’ve got a flawless model, when all they’ve really done is dress up the actual results as a prediction.
If teams want to know how well a strategy will work in the future, they need to see how good it is at predicting new events. When collecting information on past races, syndicates therefore put a chunk of the results to one side. They use the rest of the data to evaluate the factors in their model; once this is done, they test the predictions against the collection of yet-to-be-used results. This allows teams to check how the model might perform in real life.
Testing strategies against new data also helps ensure that models satisfy the scientific principle of Occam’s razor, which states that if you have to choose between several explanations for an observed event, it is best to pick the simplest. In other words, if you want to build a model of a real-life process, you should shave away the features that you can’t justify.
Comparing predictions against new data helps betting teams avoid throwing too many factors into a model, but they still need to assess how good the model actually is. One way to measure the accuracy of a prediction is to use what statisticians call the “coefficient of determination.” The coefficient ranges from 0 to 1 and can be thought of as measuring the explanatory power of a model. A value of 0 means that the model doesn’t help at all, and bettors might as well pick the winning horse at random; a value of 1 means t
he predictions line up perfectly with the actual results. Bolton and Chapman’s model had a value of 0.09. It was better than randomly choosing horses, but there were still plenty of things that the model wasn’t capturing.
Part of the problem was the data they had used. The two hundred races they’d analyzed came from five American racetracks. This meant there was a lot of hidden information: horses would have raced against a range of opponents, in different conditions, with a variety of jockeys. It might have been possible to overcome some of these problems with a lot of racing data, but with only two hundred races? It was doubtful. Still, the strategy could potentially work, if only the race conditions were a bit less variable.
IF YOU HAD TO put together an experiment to study horse racing, it would probably look a lot like Hong Kong. With races happening on one of two tracks, your laboratory conditions are going to be fairly consistent. The subjects of your experiment won’t vary too much either: in the United States, tens of thousands of horses race all over the country; in Hong Kong, there is a closed pool of about a thousand horses. With around six hundred races a year, these horses race against each other again and again, which means you can observe similar events several times, just as Pearson always tried to. And, unlike Monte Carlo and its lazy roulette reporters, in Hong Kong there’s also plenty of publicly available data on the horses and their performances.
When Benter first analyzed the Hong Kong data, he found that at least five hundred to a thousand races were needed to make good predictions. With fewer than this, there wasn’t enough information to work out how much each factor contributed to performance, which meant the model wasn’t particularly reliable. In contrast, including more than a thousand races didn’t lead to much improvement in the predictions.
The Perfect Bet Page 6