Everybody Lies

Home > Other > Everybody Lies > Page 6
Everybody Lies Page 6

by Seth Stephens-Davidowitz


  BODIES AS DATA

  In the summer of 2013, a reddish-brown horse, of above-average size, with a black mane, sat in a small barn in upstate New York. He was one of 152 one-year-old horses at August’s Fasig-Tipton Select Yearling Sale in Saratoga Springs, and one of ten thousand one-year-old horses being auctioned off that year.

  Wealthy men and women, when they shell out a lot of money on a racehorse, want the honor of choosing the horse’s name. Thus the reddish-brown horse did not yet have a name and, like most horses at the auction, was instead referred to by his barn number, 85.

  There was little that made No. 85 stand out at this auction. His pedigree was good but not great. His sire (father), Pioneerof [sic] the Nile, was a top racehorse, but other kids of Pioneerof the Nile had not had much racing success. There were also doubts based on how No. 85 looked. He had a scratch on his ankle, for example, which some buyers worried might be evidence of an injury.

  The current owner of No. 85 was an Egyptian beer magnate, Ahmed Zayat, who had come to upstate New York looking to sell the horse and buy a few others.

  Like almost all owners, Zayat hired a team of experts to help him choose which horses to buy. But his experts were a bit different than those used by nearly every other owner. The typical horse experts you’d see at an event like this were middle-aged men, many from Kentucky or rural Florida with little education but with a family background in the horse business. Zayat’s experts, however, came from a small firm called EQB. The head of EQB was not an old-school horse man. The head of EQB, instead, was Jeff Seder, an eccentric, Philadelphia-born man with a pile of degrees from Harvard.

  Zayat had worked with EQB before, so the process was familiar. After a few days of evaluating horses, Seder’s team would come back to Zayat with five or so horses they recommended buying to replace No. 85.

  This time, though, was different. Seder’s team came back to Zayat and told him they were unable to fulfill his request. They simply could not recommend that he buy any of the 151 other horses offered up for sale that day. Instead, they offered an unexpected and near-desperate plea. Zayat absolutely, positively could not sell horse No. 85. This horse, EQB declared, was not just the best horse in the auction; he was the best horse of the year and, quite possibly, the decade. “Sell your house,” the team implored him. “Do not sell this horse.”

  The next day, with little fanfare, horse No. 85 was bought for $300,000 by a man calling himself Incardo Bloodstock. Bloodstock, it was later revealed, was a pseudonym used by Ahmed Zayat. In response to the pleas of Seder, Zayat had bought back his own horse, an almost unprecedented action. (The rules of the auction prevented Zayat from simply removing the horse from the auction, thus necessitating the pseudonymous transaction.) Sixty-two horses at the auction sold for a higher price than horse No. 85, with two fetching more than $1 million each.

  Three months later, Zayat finally chose a name for No. 85: American Pharoah. And eighteen months later, on a 75-degree Saturday evening in the suburbs of New York City, American Pharoah became the first horse in more than three decades to win the Triple Crown.

  What did Jeff Seder know about horse No. 85 that apparently nobody else knew? How did this Harvard man get so good at evaluating horses?

  I first met up with Seder, who was then sixty-four, on a scorching June afternoon in Ocala, Florida, more than a year after American Pharoah’s Triple Crown. The event was a weeklong showcase for two-year-old horses, culminating in an auction, not dissimilar to the 2013 event where Zayat bought his own horse back.

  Seder has a booming, Mel Brooks–like voice, a full head of hair, and a discernable bounce in his step. He was wearing suspenders, khakis, a black shirt with his company’s logo on it, and a hearing aid.

  Over the next three days, he told me his life story—and how he became so good at predicting horses. It was hardly a direct route. After graduating magna cum laude and Phi Beta Kappa from Harvard, Seder went on to get, also from Harvard, a law degree and a business degree. At age twenty-six, he was working as an analyst for Citigroup in New York City but felt unhappy and burnt-out. One day, sitting in the atrium at the firm’s new offices on Lexington Avenue, he found himself studying a large mural of an open field. The painting reminded him of his love of the countryside and his love of horses. He went home and looked at himself in the mirror with his three-piece suit on. He knew then that he was not meant to be a banker and he was not meant to live in New York City. The next morning, he quit his job.

  Seder moved to rural Pennsylvania and ambled through a variety of jobs in textiles and sports medicine before devoting his life full-time to his passion: predicting the success of racehorses. The numbers in horse racing are rough. Of the one thousand two-year-old horses showcased at Ocala’s auction, one of the nation’s most prestigious, perhaps five will end up winning a race with a significant purse. What will happen to the other 995 horses? Roughly one-third will prove too slow. Another one-third will get injured—most because their limbs can’t withstand the enormous pressure of galloping at full speed. (Every year, hundreds of horses die on American racetracks, mostly due to broken legs.) And the remaining one-third will have what you might call Bartleby syndrome. Bartleby, the scrivener in Herman Melville’s extraordinary short story, stops working and answers every request his employer makes with “I would prefer not to.” Many horses, early in their racing careers, apparently come to realize that they don’t need to run if they don’t feel like it. They may start a race running fast, but, at some point, they’ll simply slow down or stop running altogether. Why run around an oval as fast as you can, especially when your hooves and hocks ache? “I would prefer not to,” they decide. (I have a soft spot for Bartlebys, horse or human.)

  With the odds stacked against them, how can owners pick a profitable horse? Historically, people have believed that the best way to predict whether a horse will succeed has been to analyze his or her pedigree. Being a horse expert means being able to rattle off everything anybody could possibly want to know about a horse’s father, mother, grandfathers, grandmothers, brothers, and sisters. Agents announce, for instance, that a big horse “came to her size legitimately” if her mother’s line has lots of big horses.

  There is one problem, however. While pedigree does matter, it can still only explain a small part of a racing horse’s success. Consider the track record of full siblings of all the horses named Horse of the Year, racing’s most prestigious annual award. These horses have the best possible pedigrees—the identical family history as world-historical horses. Still, more than three-fourths do not win a major race. The traditional way of predicting horse success, the data tells us, leaves plenty of room for improvement.

  It’s actually not that surprising that pedigree is not that predictive. Think of humans. Imagine an NBA owner who bought his future team, as ten-year-olds, based on their pedigrees. He would have hired an agent to examine Earvin Johnson III, son of “Magic” Johnson. “He’s got nice size, thus far,” an agent might say. “It’s legitimate size, from the Johnson line. He should have great vision, selflessness, size, and speed. He seems to be outgoing, great personality. Confident walk. Personable. This is a great bet.” Unfortunately, fourteen years later, this owner would have a 6’2” (short for a pro ball player) fashion blogger for E! Earvin Johnson III might be of great assistance in designing the uniforms, but he would probably offer little help on the court.

  Along with the fashion blogger, an NBA owner who chose a team as many owners choose horses would likely snap up Jeffrey and Marcus Jordan, both sons of Michael Jordan, and both of whom proved mediocre college players. Good luck against the Cleveland Cavaliers. They are led by LeBron James, whose mom is 5’5”. Or imagine a country that elected its leaders based on their pedigrees. We’d be led by people like George W. Bush. (Sorry, couldn’t resist.)

  Horse agents do use other information besides pedigree. For example, they analyze the gaits of two-year-olds and examine horses visually. In Ocala, I spent hours chatting w
ith various agents, which was long enough to determine that there was little agreement on what in fact they were looking for.

  Add to these rampant contradictions and uncertainties the fact that some horse buyers have what seems like infinite funds, and you get a market with rather large inefficiencies. Ten years ago, Horse No. 153 was a two-year-old who ran faster than every other horse, looked beautiful to most agents, and had a wonderful pedigree—a descendant of Northern Dancer and Secretariat, two of the greatest racehorses of all time. An Irish billionaire and a Dubai sheik both wanted to purchase him. They got into a bidding war that quickly turned into a contest of pride. As hundreds of stunned horse men and women looked on, the bids kept getting higher and higher, until the two-year-old horse finally sold for $16 million, by far the highest price ever paid for a horse. Horse No. 153, who was given the name The Green Monkey, ran three races, earned just $10,000, and was retired.

  Seder never had any interest in the traditional methods of evaluating horses. He was interested only in data. He planned to measure various attributes of racehorses and see which of them correlated with their performance. It’s important to note that Seder worked out his plan half a decade before the World Wide Web was invented. But his strategy was very much based on data science. And the lessons from his story are applicable to anybody using Big Data.

  For years, Seder’s pursuit produced nothing but frustration. He measured the size of horses’ nostrils, creating the world’s first and largest dataset on horse nostril size and eventual earnings. Nostril size, he found, did not predict horse success. He gave horses EKGs to examine their hearts and cut the limbs off dead horses to measure the volume of their fast-twitch muscles. He once grabbed a shovel outside a barn to determine the size of horses’ excrement, on the theory that shedding too much weight before an event can slow a horse down. None of this correlated with racing success.

  Then, twelve years ago, he got his first big break. Seder decided to measure the size of the horses’ internal organs. Since this was impossible with existing technology, he constructed his own portable ultrasound. The results were remarkable. He found that the size of the heart, and particularly the size of the left ventricle, was a massive predictor of a horse’s success, the single most important variable. Another organ that mattered was the spleen: horses with small spleens earned virtually nothing.

  Seder had a couple more hits. He digitized thousands of videos of horses galloping and found that certain gaits did correlate with racetrack success. He also discovered that some two-year-old horses wheeze after running one-eighth of a mile. Such horses sometimes sell for as much as a million dollars, but Seder’s data told him that the wheezers virtually never pan out. He thus assigns an assistant to sit near the finish line and weed out the wheezers.

  Of about a thousand horses at the Ocala auction, roughly ten will pass all of Seder’s tests. He ignores pedigree entirely, except as it will influence the price a horse will sell for. “Pedigree tells us a horse might have a very small chance of being great,” he says. “But if I can see he’s great, what do I care how he got there?”

  One night, Seder invited me to his room at the Hilton hotel in Ocala. In the room, he told me about his childhood, his family, and his career. He showed me pictures of his wife, daughter, and son. He told me he was one of three Jewish students in his Philadelphia high school, and that when he entered he was 4’10”. (He grew in college to 5’9”.) He told me about his favorite horse: Pinky Pizwaanski. Seder bought and named this horse after a gay rider. He felt that Pinky, the horse, always gave a great effort even if he wasn’t the most successful.

  Finally, he showed me the file that included all the data he had recorded on No. 85, the file that drove the biggest prediction of his career. Was he giving away his secret? Perhaps, but he said he didn’t care. More important to him than protecting his secrets was being proven right, showing to the world that these twenty years of cracking limbs, shoveling poop, and jerry-rigging ultrasounds had been worth it.

  Here’s some of the data on horse No. 85:

  NO. 85 (LATER AMERICAN PHAROAH) PERCENTILES AS A ONE-YEAR-OLD

  PERCENTILE

  Height

  56

  Weight

  61

  Pedigree

  70

  Left Ventricle

  99.61

  There it was, stark and clear, the reason that Seder and his team had become so obsessed with No. 85. His left ventricle was in the 99.61st percentile!

  Not only that, but all his other important organs, including the rest of his heart and spleen, were exceptionally large as well. Generally speaking, when it comes to racing, Seder had found, the bigger the left ventricle, the better. But a left ventricle as big as this can be a sign of illness if the other organs are tiny. In American Pharoah, all the key organs were bigger than average, and the left ventricle was enormous. The data screamed that No. 85 was a 1-in-100,000 or even a one-in-a-million horse.

  What can data scientists learn from Seder’s project?

  First, and perhaps most important, if you are going to try to use new data to revolutionize a field, it is best to go into a field where old methods are lousy. The pedigree-obsessed horse agents whom Seder beat left plenty of room for improvement. So did the word-count-obsessed search engines that Google beat.

  One weakness of Google’s attempt to predict influenza using search data is that you can already predict influenza very well just using last week’s data and a simple seasonal adjustment. There is still debate about how much search data adds to that simple, powerful model. In my opinion, Google searches have more promise measuring health conditions for which existing data is weaker and therefore something like Google STD may prove more valuable in the long haul than Google Flu.

  The second lesson is that, when trying to make predictions, you needn’t worry too much about why your models work. Seder could not fully explain to me why the left ventricle is so important in predicting a horse’s success. Nor could he precisely account for the value of the spleen. Perhaps one day horse cardiologists and hematologists will solve these mysteries. But for now it doesn’t matter. Seder is in the prediction business, not the explanation business. And, in the prediction business, you just need to know that something works, not why.

  For example, Walmart uses data from sales in all their stores to know what products to shelve. Before Hurricane Frances, a destructive storm that hit the Southeast in 2004, Walmart suspected—correctly—that people’s shopping habits may change when a city is about to be pummeled by a storm. They pored through sales data from previous hurricanes to see what people might want to buy. A major answer? Strawberry Pop-Tarts. This product sells seven times faster than normal in the days leading up to a hurricane.

  Based on their analysis, Walmart had trucks loaded with strawberry Pop-Tarts heading down Interstate 95 toward stores in the path of the hurricane. And indeed, these Pop-Tarts sold well.

  Why Pop-Tarts? Probably because they don’t require refrigeration or cooking. Why strawberry? No clue. But when hurricanes hit, people turn to strawberry Pop-Tarts apparently. So in the days before a hurricane, Walmart now regularly stocks its shelves with boxes upon boxes of strawberry Pop-Tarts. The reason for the relationship doesn’t matter. But the relationship itself does. Maybe one day food scientists will figure out the association between hurricanes and toaster pastries filled with strawberry jam. But, while waiting for some such explanation, Walmart still needs to stock its shelves with strawberry Pop-Tarts when hurricanes are approaching and save the Rice Krispies treats for sunnier days.

  This lesson is also clear in the story of Orley Ashenfelter. What Seder is to horses, Ashenfelter, an economist at Princeton, may be to wine.

  A little over a decade ago, Ashenfelter was frustrated. He had been buying a lot of red wine from the Bordeaux region of France. Sometimes this wine was delicious, worthy of its high price. Many times, though, it was a letdown.

  Why, Ashenfelter wondered, was he pay
ing the same price for wine that turned out so differently?

  One day, Ashenfelter received a tip from a journalist friend and wine connoisseur. There was indeed a way to figure out whether a wine would be good. The key, Ashenfelter’s friend told him, was the weather during the growing season.

  Ashenfelter’s interest was piqued. He went on a quest to figure out if this was true and he could consistently purchase better wine. He downloaded thirty years of weather data on the Bordeaux region. He also collected auction prices of wines. The auctions, which occur many years after the wine was originally sold, would tell you how the wine turned out.

  The result was amazing. A huge percentage of the quality of a wine could be explained simply by the weather during the growing season.

  In fact, a wine’s quality could be broken down to one simple formula, which we might call the First Law of Viticulture:

  Price = 12.145 + 0.00117 winter rainfall + 0.0614 average growing season temperature – 0.00386 harvest rainfall.

  So why does wine quality in the Bordeaux region work like this? What explains the First Law of Viticulture? There is some explanation for Ashenfelter’s wine formula—heat and early irrigation are necessary for grapes to properly ripen.

  But the precise details of his predictive formula go well beyond any theory and will likely never be fully understood even by experts in the field.

  Why does a centimeter of winter rain add, on average, exactly 0.1 cents to the price of a fully matured bottle of red wine? Why not 0.2 cents? Why not 0.05? Nobody can answer these questions. But if there are 1,000 centimeters of additional rain in a winter, you should be willing to pay an additional $1 for a bottle of wine.

 

‹ Prev