by Cathy O'Neil
The second false assumption was that not many people would default at the same time. This was based on the theory, soon to be disproven, that defaults were largely random and unrelated events. This led to a belief that solid mortgages would offset the losers in each tranche. The risk models were assuming that the future would be no different from the past.
In order to sell these mortgage-backed bonds, the banks needed AAA ratings. For this, they looked to the three credit-rating agencies. As the market expanded, rating the growing billion-dollar market in mortgage bonds turned into a big business for the agencies, bringing in lucrative fees. They grew addicted to those fees. And they understood all too clearly that if they provided anything less than AAA ratings, the banks would take the work to their competitors. So the agencies played ball. They paid more attention to customer satisfaction than to the accuracy of their models. These risk models also created their own pernicious feedback loop. The AAA ratings on defective products turned into dollars. The dollars in turn created confidence in the products and in the cheating-and-lying process that manufactured them. The resulting cycle of mutual back-scratching and pocket-filling was how the whole sordid business operated until it blew up.
Of all the WMD qualities, the one that turned these risk models into a monstrous force of global dimension was scale. Snake oil vendors, of course, are as old as history, and in previous real estate bubbles unwitting buyers ended up with swampland and stacks of false deeds. But this time the power of modern computing fueled fraud at a scale unequaled in history. The damage was compounded by other vast markets that had grown up around the mortgage-backed securities: credit default swaps and synthetic collateralized debt obligations, or CDOs. Credit default swaps were small insurance policies that transferred the risk on a bond. The swaps gave banks and hedge funds alike a sense of security, since they could supposedly use them to balance risk. But if the entities holding these insurance policies go belly up, as many did, the chain reaction blows holes through the global economy. Synthetic CDOs went one step further: they were contracts whose value depended on the performance of credit default swaps and mortgage-backed securities. They allowed financial engineers to leverage up their bets even more.
The overheated (and then collapsing) market featured $3 trillion of subprime mortgages by 2007, and the market around it—including the credit default swaps and synthetic CDOs, which magnified the risks—was twenty times as big. No national economy could compare.
Paradoxically, the supposedly powerful algorithms that created the market, the ones that analyzed the risk in tranches of debt and sorted them into securities, turned out to be useless when it came time to clean up the mess and calculate what all the paper was actually worth. The math could multiply the horseshit, but it could not decipher it. This was a job for human beings. Only people could sift through the mortgages, picking out the false promises and wishful thinking and putting real dollar values on the loans. It was a painstaking process, because people—unlike WMDs—cannot scale their work exponentially, and for much of the industry it was a low priority. During this lengthy detox, of course, the value of the debt—and the homes that the debt relied on—kept falling. And as the economy took a nosedive, even home owners who could afford their mortgages when the crisis began were suddenly at risk of defaulting, too.
As I’ve mentioned, Shaw was a step or two removed from the epicenter of the market collapse. But as other players started to go under, they were frantically undoing trades that affected the ones we had on our books. It had a cascading effect, and as we entered the second half of 2008 we were losing money left and right.
Over the following months, disaster finally hit the mainstream. That’s when everyone finally saw the people on the other side of the algorithms. They were desperate home owners losing their homes and millions of Americans losing their jobs. Credit card defaults leapt to record highs. The human suffering, which had been hidden from view behind numbers, spreadsheets, and risk scores, became palpable.
The chatter at Shaw was nervous. After the fall of Lehman Brothers in September of 2008, people discussed the political fallout. Barack Obama looked likely to win the election in November. Would he hammer the industry with new regulations? Raise taxes on carried interest? These people weren’t losing their houses or maxing out their credit cards just to stay afloat. But they found plenty to worry about, just the same. The only choice was to wait it out, let the lobbyists do their work, and see if we’d be allowed to continue as usual.
By 2009, it was clear that the lessons of the market collapse had brought no new direction to the world of finance and had instilled no new values. The lobbyists succeeded, for the most part, and the game remained the same: to rope in dumb money. Except for a few regulations that added a few hoops to jump through, life went on.
This drama pushed me quickly along in my journey of disillusionment. I was especially disappointed in the part that mathematics had played. I was forced to confront the ugly truth: people had deliberately wielded formulas to impress rather than clarify. It was the first time I had been directly confronted with this toxic concept, and it made me want to escape, to go back in time to the world of proofs and Rubik’s Cubes.
And so I left the hedge fund in 2009 with the conviction that I would work to fix the financial WMDs. New regulations were forcing banks to hire independent experts to analyze their risk. I went to work for one of the companies providing that analysis, RiskMetrics Group, one block north of Wall Street. Our product was a blizzard of numbers, each of them predicting the likelihood that a certain tranche of securities or commodities would go poof within the next week, the next year, or the next five years. When everyone is betting on everything that moves in the market, a smart read on risk is worth gold.
To calculate risk, our team employed the Monte Carlo method. To picture it, just imagine spinning the roulette wheel at a casino ten thousand times, taking careful notes all the while. Using Monte Carlo, you’d typically start with historical market data and run through thousands of test scenarios. How would the portfolio we’re studying fare on each trading day since 2010, or 2005? Would it survive the very darkest days of the crash? How likely is it that a mortal threat will arise in the next year or two? To come up with these odds, scientists run thousands upon thousands of simulations. There was plenty to complain about with this method, but it was a simple way to get some handle on your risk.
My job was to act as a liaison between our risk management business and the largest and most discerning connoisseurs of risk, the quantitative hedge funds. I’d call the hedge funds, or they’d call me, and we’d discuss any questions they had about our numbers. As often as not, though, they’d notify me only when we’d made a mistake. The fact was, the hedge funds always considered themselves the smartest of the smart, and since understanding risk was fundamental to their existence, they would never rely entirely on outsiders like us. They had their own risk teams, and they bought our product mostly to look good for investors.
I also answered the hotline and would sometimes find myself answering questions from clients at big banks. Eager to repair their tattered image, they wanted to be viewed as responsible, which is why they were calling in the first place. But, unlike the hedge funds, they showed little interest in our analysis. The risk in their portfolios was something they almost seemed to ignore. Throughout my time at the hotline, I got the sense that the people warning about risk were viewed as party poopers or, worse, a threat to the bank’s bottom line. This was true even after the cataclysmic crash of 2008, and it’s not hard to understand why. If they survived that one—because they were too big to fail—why were they going to fret over risk in their portfolio now?
The refusal to acknowledge risk runs deep in finance. The culture of Wall Street is defined by its traders, and risk is something they actively seek to underestimate. This is a result of the way we define a trader’s prowess, namely by his “Sharpe ratio,” which is calculated as the profits he generates divided by the r
isks in his portfolio. This ratio is crucial to a trader’s career, his annual bonus, his very sense of being. If you disembody those traders and consider them as a set of algorithms, those algorithms are relentlessly focused on optimizing the Sharpe ratio. Ideally, it will climb, or at least never fall too low. So if one of the risk reports on credit default swaps bumped up the risk calculation on one of a trader’s key holdings, his Sharpe ratio would tumble. This could cost him hundreds of thousands of dollars when it came time to calculate his year-end bonus.
I soon realized that I was in the rubber-stamp business. In 2011 it was time to move again, and I saw a huge growth market for mathematicians like me. In the time it took me to type two words into my résumé, I was a newly proclaimed Data Scientist, and ready to plunge into the Internet economy. I landed a job at a New York start-up called Intent Media.
I started out building models to anticipate the behavior of visitors to various travel websites. The key question was whether someone showing up at the Expedia site was just browsing or looking to spend money. Those who weren’t planning to buy were worth very little in potential revenue. So we would show them comparison ads for competing services such as Travelocity or Orbitz. If they clicked on the ad, it brought in a few pennies, which was better than nothing. However, we didn’t want to feed these ads to serious shoppers. In the worst case, we’d gain a dime of ad revenue while sending potential customers to rivals, where perhaps they’d spend thousands of dollars on hotel rooms in London or Tokyo. It would take thousands of ad views to make up for even a few hundred dollars in lost fees. So it was crucial to keep those people in house.
My challenge was to design an algorithm that would distinguish window shoppers from buyers. There were a few obvious signals. Were they logged into the service? Had they bought there before? But I also scoured for other hints. What time of day was it, and what day of the year? Certain weeks are hot for buyers. The Memorial Day “bump,” for example, occurs in mid-spring, when large numbers of people make summer plans almost in unison. My algorithm would place a higher value on shoppers during these periods, since they were more likely to buy.
The statistical work, as it turned out, was highly transferable from the hedge fund to e-commerce—the biggest difference was that, rather than the movement of markets, I was now predicting people’s clicks.
In fact, I saw all kinds of parallels between finance and Big Data. Both industries gobble up the same pool of talent, much of it from elite universities like MIT, Princeton, or Stanford. These new hires are ravenous for success and have been focused on external metrics—like SAT scores and college admissions—their entire lives. Whether in finance or tech, the message they’ve received is that they will be rich, that they will run the world. Their productivity indicates that they’re on the right track, and it translates into dollars. This leads to the fallacious conclusion that whatever they’re doing to bring in more money is good. It “adds value.” Otherwise, why would the market reward it?
In both cultures, wealth is no longer a means to get by. It becomes directly tied to personal worth. A young suburbanite with every advantage—the prep school education, the exhaustive coaching for college admissions tests, the overseas semester in Paris or Shanghai—still flatters himself that it is his skill, hard work, and prodigious problem-solving abilities that have lifted him into a world of privilege. Money vindicates all doubts. And the rest of his circle plays along, forming a mutual admiration society. They’re eager to convince us all that Darwinism is at work, when it looks very much to the outside like a combination of gaming a system and dumb luck.
In both of these industries, the real world, with all of its messiness, sits apart. The inclination is to replace people with data trails, turning them into more effective shoppers, voters, or workers to optimize some objective. This is easy to do, and to justify, when success comes back as an anonymous score and when the people affected remain every bit as abstract as the numbers dancing across the screen.
I was already blogging as I worked in data science, and I was also getting more involved with the Occupy movement. More and more, I worried about the separation between technical models and real people, and about the moral repercussions of that separation. In fact, I saw the same pattern emerging that I’d witnessed in finance: a false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops. Those who objected were regarded as nostalgic Luddites.
I wondered what the analogue to the credit crisis might be in Big Data. Instead of a bust, I saw a growing dystopia, with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain ever more control over the data economy, raking in outrageous fortunes and convincing themselves all the while that they deserved it.
After a couple of years working and learning in the Big Data space, my journey to disillusionment was more or less complete, and the misuse of mathematics was accelerating. In spite of blogging almost daily, I could barely keep up with all the ways I was hearing of people being manipulated, controlled, and intimidated by algorithms. It started with teachers I knew struggling under the yoke of the value-added model, but it didn’t end there. Truly alarmed, I quit my job to investigate the issue in earnest.
If you sit down to dinner with friends in certain cities—San Francisco and Portland, to name two—you’ll likely find that sharing plates is an impossibility. No two people can eat the same things. They’re all on different diets. These range from vegan to various strains of Paleo, and people swear by them (if only for a month or two). Now imagine if one of those regimes, say the caveman diet, became the national standard: if 330 million people all followed its dictates.
The effects would be dramatic. For starters, a single national diet would put the agricultural economy through the wringer. Demand for the approved meats and cheeses would skyrocket, pushing prices up. Meanwhile, the diet’s no-no sectors, like soybeans and potatoes, would go begging. Diversity would shrivel. Suffering bean farmers would turn over their fields to cows and pigs, even on land unsuited for it. The additional livestock would slurp up immense quantities of water. And needless to say, a single diet would make many of us extremely unhappy.
What does a single national diet have to do with WMDs? Scale. A formula, whether it’s a diet or a tax code, might be perfectly innocuous in theory. But if it grows to become a national or global standard, it creates its own distorted and dystopian economy. This is what has happened in higher education.
The story starts in 1983. That was the year a struggling newsmagazine, U.S. News & World Report, decided to undertake an ambitious project. It would evaluate 1,800 colleges and universities throughout the United States and rank them for excellence. This would be a useful tool that, if successful, would help guide millions of young people through their first big life decision. For many, that single choice would set them on a career path and introduce them to lifelong friends, often including a spouse. What’s more, a college-ranking issue, editors hoped, might turn into a newsstand sensation. Perhaps for that one week, U.S. News could match its giant rivals, Time and Newsweek.
But what information would feed this new ranking? In the beginning, the staff at U.S. News based its scores entirely on the results of opinion surveys it sent to university presidents. Stanford came out as the top national university, and Amherst as the best liberal arts college. While popular with readers, the ratings drove many college administrators crazy. Complaints poured into the magazine that the rankings were unfair. Many college presidents, students, and alumni insisted that they deserved a higher ranking. All the magazine had to do was look at the data.
In the following years, editors at U.S. News tried to figure out what they could measure. This is how many models start out, with a series of hunches. The process is not scientific and has scant grounding in statistical analysis. In this case, it was just people wondering what matters most in education, then figuring out which
of those variables they could count, and finally deciding how much weight to give each of them in the formula.
In most disciplines, the analysis feeding a model would demand far more rigor. In agronomy, for example, researchers might compare the inputs—the soil, the sunshine, and fertilizer—and the outputs, which would be specific traits in the resulting crops. They could then experiment and optimize according to their objectives, whether price, taste, or nutritional value. This is not to say that agronomists cannot create WMDs. They can and do (especially when they neglect to consider long-term and wide-ranging effects of pesticides). But because their models, for the most part, are tightly focused on clear outcomes, they are ideal for scientific experimentation.
The journalists at U.S. News, though, were grappling with “educational excellence,” a much squishier value than the cost of corn or the micrograms of protein in each kernel. They had no direct way to quantify how a four-year process affected one single student, much less tens of millions of them. They couldn’t measure learning, happiness, confidence, friendships, or other aspects of a student’s four-year experience. President Lyndon Johnson’s ideal for higher education—“a way to deeper personal fulfillment, greater personal productivity and increased personal reward”—didn’t fit into their model.
Instead they picked proxies that seemed to correlate with success. They looked at SAT scores, student-teacher ratios, and acceptance rates. They analyzed the percentage of incoming freshmen who made it to sophomore year and the percentage of those who graduated. They calculated the percentage of living alumni who contributed money to their alma mater, surmising that if they gave a college money there was a good chance they appreciated the education there. Three-quarters of the ranking would be produced by an algorithm—an opinion formalized in code—that incorporated these proxies. In the other quarter, they would factor in the subjective views of college officials throughout the country.