by Matthew Syed
The most intuitive way to glimpse the relationship between marginal gains and big achievements is to examine the pit stop. This is one of thousands of different components that, collectively, determine whether an F1 team is successful or not. It is a marginal aspect of performance, but a crucial one. In order to gain a deeper insight I went out to the season-ending Grand Prix in Abu Dhabi and immersed myself within the Mercedes operation.
At the team’s motor home, a small, three-story house within the Yas Marina Circuit, I talked to James Vowles, chief strategist for Mercedes F1. I asked him how the team went about developing the optimum pit-stop procedure. Vowles says:
We use the same method for everything, not just pit stops. First of all, you need a decent understanding of the engineering problem. So, with the pit stops we came up with a strategy based on our blue-sky ideas. But this strategy was always going to be less than optimal, because the problem is complex. So we created sensors so we could measure what was happening and test our assumptions.
But the crucial thing is what happened next. Once you have gone through a practice cycle with the initial strategy, you immediately realize that there are miscellaneous items that you are not measuring. Just doing a pit-stop practice-run opens your eyes to data points that are relevant to the task, but that were absent from the initial blueprint. So the second stage of the cycle is about improving your measurement statistics, even before you start to improve the pit-stop process.
Think about that for a moment. We have talked about the concept of an open loop. This is where a strategy is put in action, then tested to see if it is working. By seeing what is going wrong, you can then improve the strategy. Mercedes takes this one step further. They use the first test not to improve the strategy, but to create richer feedback. Only when they have a deeper understanding of all the relevant data do they start to iterate.
Vowles says:
We have placed eight sensors on every single one of the wheel-nut guns in order to access the most systematic data. Just by looking at this data, without speaking to the human involved, I can ascertain exactly what has happened on each pit stop. When the gun operator initially connected to the wheel nut, I can tell that they, say, connected 20 degrees off the optimum angle. When they start rotating the gun, I can tell how long it has taken for the nut to physically loosen all its preloaded torque and for the wheel to start moving off the axle.
I can tell how quickly the gun man has moved away; how quickly he has reconnected, how long it has taken for the tire to be removed, the second tire to be refitted to the axle, how clean the second connection was to it, and how long he was gunning on for. The precision of this information helps us to create an optimization loop. It shows us how to improve every time-sensitive aspect.
This is marginal gains on turbocharge. “You improve your data set before you begin to improve your final function; what you are doing is ensuring that you have understood what you didn’t initially understand,” Vowles says. “This is important because you must have the right information at the right time in order to deliver the right optimization, which can further improve and guide the cycle.”
Later that evening I went to the pit-lane to watch the team practice. It was an astonishing feat of collective endeavor. The car of Lewis Hamilton, the top driver for Mercedes, was pushed into position by three runners, and then instantly pounced upon by a team of around sixteen people, all with clearly defined tasks and exquisitely coordinated procedures. Again and again they practiced, dealing with every contingency that might arise in the race the next day. Every practice run was measured with the eight sensors, and videotaped, so it could pass through another optimization loop. One of the pit stops I witnessed was completed in an astonishing 1.95 seconds.*
Vowles said:
The secret to modern F1 is not really to do with big ticket items; it is about hundreds of thousands of small items, optimized to the nth degree. People think that things like engines are based upon high-level strategic decisions, but they are not. What is an engine except many iterations of small components? You start with a sensible design, but it is the iterative process that guides you to the best solution. Success is about creating the most effective optimization loop.
I also spoke to Andy Cowell, the leader of the team that devised the engine. His attitude was a carbon copy of that of Vowles.
We got our development engine up and running in late December [2012]. We didn’t design it to be car friendly. We didn’t try and figure out the perfect weight and aerodynamic design. Rather, we got a working model out there early, so that we could test it, and improve. It was the process of learning in the test cell that enabled us to create the most thermally efficient engine in the world.
The marginal gains approach is not just about mechanistic iteration. You need judgment and creativity to determine how to find solutions to what the data is telling you, but those judgments, in turn, are tested as part of the next optimization loop. Creativity not guided by a feedback mechanism is little more than white noise. Success is a complex interplay between creativity and measurement, the two operating together, the two sides of the optimization loop.
We will examine the creative process in more detail in the next chapter, but Vowles and Cowell have described a compelling model. It is the model used by Brailsford and the latest generation of development economists. Mercedes clocks up literally thousands of tiny failures. As Toto Wolff, the charismatic executive director of the team, put it: “We make sure we know where we are going wrong, so we can get things right.”
The basic proposition of this book is that we have an allergic attitude to failure. We try to avoid it, cover it up, and airbrush it from our lives. We have looked at cognitive dissonance, the careful use of euphemisms, anything to divorce us from the pain we feel when we are confronted with the realization that we have underperformed.
Brailsford, Duflo and Vowles see weaknesses with a different set of eyes. Every error, every flaw, every failure, however small, is a marginal gain in disguise. This information is regarded not as a threat but as an opportunity. They are, in a sense, like aviation safety experts, who regard every near-miss event as a precious chance to avert an accident before it happens.*
On the eve of the Grand Prix at the Yas Marina Circuit, qualifying took place. This is where the drivers compete to see who can post the fastest lap, with the winner taking pole position (the most advantageous place on the starting grid) for the Grand Prix. Nico Rosberg, a German driver for Mercedes, took first place on the grid and Lewis Hamilton, his British teammate, took second place.
Afterward, I was given access to the highly secretive debriefing meeting. At a table in a room in the Mercedes garage, a few meters from the track, Hamilton and Rosberg sat facing each other. They were flanked by their respective race engineers. On the left was Paddy Lowe, the technical boss, and on other tables were experts in different aspects of performance.
Everybody wore headsets with microphones and scrutinized data on computer screens. On a big screen in the corner of the room was the team back in the UK, all hooked into the conversation. Much of the meeting was confidential. But the process was fascinating. Hamilton and Rosberg were taken through each dimension of performance: tires, engine, the helmet, whether the drinks provided during qualifying were at the right temperature.
Each observation from the two drivers was then double-checked against the hard data, and possible improvements noted. After the meeting, the next stage of the optimization loop was already underway, with analysts creating new marginal gains. I couldn’t help contemplating the contrast between the spirit of this approach and that of other areas of our world.
The following day I observed the race from the Mercedes garage. Hamilton made a blistering start from second position on the grid and went on to win the race. The points from his victory propelled him to the overall driver’s championship. Rosberg came in second in the overall classification. M
ercedes won the constructors championship: the most successful team in F1.
Afterward, champagne bottles were uncorked in the garage as mechanics, engineers, pit-stop operators, and the two drivers finally let their hair down. “I drive the car, but I have an incredible operation behind me,” Hamilton said. Vowles added: “We will enjoy tonight, but tomorrow we will feed what we learned today into the next stage of the optimization loop.”
Paddy Lowe, the man responsible for the technical operation, looked on from the back of the garage. “F1 is an unusual environment because you have incredibly intelligent people driven by the desire to win,” he said. “The ambition spurs rapid innovation. Things from just two years ago seem antique. Standing still is tantamount to extinction.”
IV
Google had a decision to make. Jamie Divine, then one of the company’s top designers, had come up with a new shade of blue to use on the Google toolbar. He reckoned it would boost the number of click-throughs.
The narrative surrounding the new shade sounded very good. The color was enticing; it meshed with what was known about consumer psychology. Divine, after all, was one of the top designers at the company. But how could Google be sure that he was right?
The conventional way would have been to change the color on the Google toolbar and see what happened. The obvious problem with this approach should, by now, be obvious. Even if clicks increased, Google could not be certain if the increase was caused by the color change or by something else. Perhaps the number of clicks would have gone up even more if the color had stayed the same.
And this is why, even as executives were debating Divine’s shade, a product manager decided to conduct a test. He picked a slightly different shade of blue (one with a hint of green) and put it into a contest with the shade selected by Divine. In effect, users clicking on the Google website were randomly assigned to one of the two shades and their behavior monitored. It was an RCT. The result of the experiment was clear: more people clicked through on the blue with a hint of green.
There was no room for spin or bluster of the kind that often accompanies business decisions. There was just a flip of a coin, a random assignment, and a precise measurement.* The fact that Divine’s shade lost out in this trial didn’t mean he was a poor designer. Rather, it showed that his considerable knowledge was insufficient to predict how a tiny alteration in shade would impact consumer behavior. But then nobody could have known that for sure. The world is too complex.
But this was just the start. Google executives realized that the success of the greeny-blue shade was not conclusive. After all, who’s to say that this particular shade is better than all other possible shades? Marissa Mayer, of Yahoo!, then a vice president at Google, came up with a more systematic trial. She divided the relevant part of the color spectrum into forty constituent shades and then ran another test.
Users of Google Mail were randomly grouped into forty populations of 2.5 percent and, as they visited the site at different times, were confronted with different shades, and tracked. Google was thus able to determine the optimal shade, not through blue-sky thinking or slick narratives, but through testing. They determined the optimum shade through trial and error.
This approach is now a key part of Google’s operation. As of 2010, the company was carrying out 12,000 RCTs every year. This is an astonishing amount of experimentation and it means that Google clocks up thousands of little failures. Each RCT may seem like nitpicking, but the cumulative effect starts to look very different. According to Google UK’s managing director, Dan Cobley, the color-switch generated $200 million in additional annual revenue.*
Perhaps the company most associated with randomized trials, however, is Capital One, the credit card provider. The business was created by Richard Fairbank and Nigel Morris, two consultants with backgrounds in evidence-based research. They created the company with one objective in mind: to test as widely and as intelligently as possible.
When sending out letters to solicit new clients, for example, they could have gone to a number of different experts who would doubtless have come up with different templates and colors. Should the color be red or blue? Should the font be Times New Roman or Calibri?
Instead of debating the questions, however, Fairbank and Morris tested them. They sent out 50,000 letters to randomly selected households with one color and 50,000 with another color, and then measured the relative profitability from the resulting groups. Then they tested different fonts, and different wording, and different scripts at their call centers.9
Every year since it was founded Capital One has run thousands of similar tests. They have turned the company into a “scientific laboratory where every decision about product design, marketing, channels of communication, credit lines, customer selection, collection policies, and cross-selling decisions could be subjected to systematic testing and using thousands of experiments.”10
As of 2015, Capital One was valued at around £45 billion.
Jim Manzi, an American entrepreneur and author who helps companies to run randomized trials, estimates that 20 percent of all retail data is now put through his software platform. This hints, more than anything else, at how far the marginal gains approach has traveled in the corporate world. “Businesses now execute more RCTs than all other kinds of institutions combined,” he told me. “It is one of the biggest changes in corporate practice for a generation.”11
Harrah’s Casino Group is symbolic of the quiet revolution that has been taking place. The brand, which operates casinos and resorts across America, reportedly has three golden rules for staff: “Don’t harass women, don’t steal, and you’ve got to have a control group.”
• • •
RCTs, whether in business or beyond, are often very dependent on context. A trial that improves, say, educational outcomes in Kenya has no claim to improve outcomes in London.* This is both the beauty of the social world, and its challenge. We need to run lots of trials, lots of replications, to tease out how far conclusions can be extended from one trial to other contexts. To do this we need to create the capacity for running experiments at scale and at a lower unit cost.
But this doesn’t mean that we cannot draw big conclusions from RCTs. Perhaps the most ambitious use of randomized trials in public policy took place in regard to employment policy. In America in the 1980s, how to get people off welfare and into work was one of the most pressing issues of the day. Policy would conventionally have been decided by the top-down deliberations of presidents and congressmen in collaboration with advisers and pressure groups.
Instead, it was determined by experimentation. As Jim Manzi details in his excellent book Uncontrolled, states were given waivers to depart from federal policy on the proviso they used randomized trials to evaluate the changes. The results were dramatic. The trials revealed that financial incentives don’t work. Time limits don’t work.
The only thing that worked? Mandatory work requirements. This paved the way for Bill Clinton’s highly successful workfare program, secured with the backing of a Republican Congress.
V
Marginal gains may seem like an approach that only big corporations, governments, and sports franchises can hope to adopt. After all, running controlled experiments requires expertise and, often, sizable budgets. But a willingness to test assumptions is ultimately about a mindset. It is about intellectual honesty and a readiness to learn when one fails. Seen in this way, it is relevant to any business; in fact to almost any problem.
Take Takeru Kobayashi. At one time, he was an impoverished economics student, struggling to pay the electric bill of the apartment he shared with his girlfriend in Yokkaichi, on the eastern coast of Japan. Then he heard about a televised speed-eating contest in the area that had a first prize of $5,000. He entered the competition, did a bit of serious practice, and won.12
Intrigued, he discovered that speed-eating is a globally competitive sport, with serious rewards. Th
is was a possible route out of poverty. So, as documented in the excellent book Think Like a Freak, Kobayashi targeted the world’s biggest competition—Nathan’s Hot Dog Eating Contest, which takes place every July Fourth in Coney Island, New York.
The rules are straightforward: eat as many hot dogs and buns as you can in twelve minutes. You are allowed to drink anything you like, but you are not allowed to vomit significantly (a problem known in the sport as a “reversal of fortune”).
Kobayashi approached the contest with a marginal gains mindset. First, instead of eating the hot dog as a whole (as all speed-eating champions had done until that point), he tried breaking it in half. He found that it gave him more options for chewing, and freed his hands to improve loading. It was a marginal gain. Then he experimented with eating the dog and the bread separately rather than at once. He found that the dogs went down super fast, but he still struggled with the chewy, doughy buns.
So he experimented by dipping the buns in water, then in water at different temperatures, then with water sprinkled with vegetable oil, then he videotaped his training sessions, recorded the data on spreadsheets, tracked slightly different strategies (flat out, pacing himself, sprint finishing), tested different ways of chewing, swallowing, and various “wriggles” that manipulated the space in his stomach in order to avoid vomiting. He tested each small assumption.
When he arrived at Coney Island he was a rank outsider. Nobody gave him a chance. He was slight and short, unlike many of his super-sized competitors. The world record was 25.125 hot dogs in twelve minutes, an astonishing total. Most observers thought this was close to the upper limit for humans. Kobayashi had other ideas. The student smashed the competition to pieces. He ate an eye-watering 50 hot dogs, almost doubling the record. “People think that if you have a huge appetite, then you’ll be better at it,” he said. “But, actually, it’s how you confront the food that is brought to you.”