Weapons of Math Destruction

Page 20

by Cathy O'Neil

Now that we’ve seen how corporations can move decisively to right a wrong in their hiring algorithms, why can’t they make similar adjustments to the mathematical models wreaking havoc on our society, the WMDs?

Unfortunately, there’s a glaring difference. Gay rights benefited in many ways from market forces. There was a highly educated and increasingly vocal gay and lesbian talent pool that companies were eager to engage. So they optimized their models to attract them. But they did this with the focus on the bottom line. Fairness, in most cases, was a by-product. At the same time, businesses across the country were starting to zero in on wealthy LGBT consumers, offering cruises, happy hours, and gay-themed TV shows. While inclusiveness no doubt caused grumbling in some pockets of intolerance, it also paid rich dividends.

Dismantling a WMD doesn’t always offer such obvious payoff. While more fairness and justice would of course benefit society as a whole, individual companies are not positioned to reap the rewards. For most of them, in fact, WMDs appear to be highly effective. Entire business models, such as for-profit universities and payday loans, are built upon them. And when a software program successfully targets people desperate enough to pay 18 percent a month, those raking in the profits think it’s working just fine.

The victims, of course, feel differently. But the greatest number of them—the hourly workers and unemployed, the people dragging low credit scores through life—are poor. Prisoners are powerless. And in our society, where money buys influence, these WMD victims are nearly voiceless. Most are disenfranchised politically. Indeed, all too often the poor are blamed for their poverty, their bad schools, and the crime that afflicts their neighborhoods. That’s why few politicians even bother with antipoverty strategies. In the common view, the ills of poverty are more like a disease, and the effort—or at least the rhetoric—is to quarantine it and keep it from spreading to the middle class. We need to think about how we assign blame in modern life and how models exacerbate this cycle.

But the poor are hardly the only victims of WMDs. Far from it. We’ve already seen how malevolent models can blacklist qualified job applicants and dock the pay of workers who don’t fit a corporation’s picture of ideal health. These WMDs hit the middle class as hard as anyone. Even the rich find themselves microtargeted by political models. And they scurry about as frantically as the rest of us to satisfy the remorseless WMD that rules college admissions and pollutes higher education.

It’s also important to note that these are the early days. Naturally, payday lenders and their ilk start off by targeting the poor and the immigrants. Those are the easiest targets, the low-hanging fruit. They have less access to information, and more of them are desperate. But WMDs generating fabulous profit margins are not likely to remain cloistered for long in the lower ranks. That’s not the way markets work. They’ll evolve and spread, looking for new opportunities. We already see this happening as mainstream banks invest in peer-to-peer loan operations like Lending Club. In short, WMDs are targeting us all. And they’ll continue to multiply, sowing injustice, until we take steps to stop them.

Injustice, whether based in greed or prejudice, has been with us forever. And you could argue that WMDs are no worse than the human nastiness of the recent past. In many cases, after all, a loan officer or hiring manager would routinely exclude entire races, not to mention an entire gender, from being considered for a mortgage or a job offer. Even the worst mathematical models, many would argue, aren’t nearly that bad.

But human decision making, while often flawed, has one chief virtue. It can evolve. As human beings learn and adapt, we change, and so do our processes. Automated systems, by contrast, stay stuck in time until engineers dive in to change them. If a Big Data college application model had established itself in the early 1960s, we still wouldn’t have many women going to college, because it would have been trained largely on successful men. If museums at the same time had codified the prevalent ideas of great art, we would still be looking almost entirely at work by white men, the people paid by rich patrons to create art. The University of Alabama’s football team, needless to say, would still be lily white.

Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit.

In a sense, our society is struggling with a new industrial revolution. And we can draw some lessons from the last one. The turn of the twentieth century was a time of great progress. People could light their houses with electricity and heat them with coal. Modern railroads brought in meat, vegetables, and canned goods from a continent away. For many, the good life was getting better.

Yet this progress had a gruesome underside. It was powered by horribly exploited workers, many of them children. In the absence of health or safety regulations, coal mines were death traps. In 1907 alone, 3,242 miners died. Meatpackers worked twelve to fifteen hours a day in filthy conditions and often shipped toxic products. Armour and Co. dispatched cans of rotten beef by the ton to US Army troops, using a layer of boric acid to mask the stench. Meanwhile, rapacious monopolists dominated the railroads, energy companies, and utilities and jacked up customers’ rates, which amounted to a tax on the national economy.

Clearly, the free market could not control its excesses. So after journalists like Ida Tarbell and Upton Sinclair exposed these and other problems, the government stepped in. It established safety protocols and health inspections for food, and it outlawed child labor. With the rise of unions, and the passage of laws safeguarding them, our society moved toward eight-hour workdays and weekends off. These new standards protected companies that didn’t want to exploit workers or sell tainted foods, because their competitors had to follow the same rules. And while they no doubt raised the costs of doing business, they also benefited society as a whole. Few of us would want to return to a time before they existed.

How do we start to regulate the mathematical models that run more and more of our lives? I would suggest that the process begin with the modelers themselves. Like doctors, data scientists should pledge a Hippocratic Oath, one that focuses on the possible misuses and misinterpretations of their models. Following the market crash of 2008, two financial engineers, Emanuel Derman and Paul Wilmott, drew up such an oath. It reads:

~ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.

~ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.

~ I will never sacrifice reality for elegance without explaining why I have done so.

~ Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.

~ I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.

That’s a good philosophical grounding. But solid values and self-regulation rein in only the scrupulous. What’s more, the Hippocratic Oath ignores the on-the-ground pressure that data scientists often confront when bosses push for specific answers. To eliminate WMDs, we must advance beyond establishing best practices in our data guild. Our laws need to change, too. And to make that happen we must reevaluate our metric of success.

Today, the success of a model is often measured in terms of profit, efficiency, or default rates. It’s almost always something that can be counted. What should we be counting, though? Consider this example. When people look for information about food stamps on a search engine, they are often confronted with ads for go-betweens, like FindFamilyResources, of Tempe, Arizona. Such sites look official and provide links to real government forms. But they also gather names and e-mail addresses for predatory advertisers, including for-profit colleges. They rake in lead generation fees by providing a superfluous service to people, many of who
m are soon targeted for services they can ill afford.

Is the transaction successful? It depends on what you count. For Google, the click on the ad brings in a quarter, fifty cents, or even a dollar or two. That’s a success. Naturally, the lead generator also makes money. And so it looks as though the system is functioning efficiently. The wheels of commerce are turning.

Yet from society’s perspective, a simple hunt for government services puts a big target on the back of poor people, leading a certain number of them toward false promises and high-interest loans. Even considered strictly from an economic point of view, it’s a drain on the system. The fact that people need food stamps in the first place represents a failing of the market economy. The government, using tax dollars, attempts to compensate for it, with the hope that food stamp recipients will eventually be able to fully support themselves. But the lead aggregators push them toward needless transactions, leaving a good number of them with larger deficits, and even more dependent on public assistance. The WMD, while producing revenue for search engines, lead aggregators, and marketers, is a leech on the economy as a whole.

A regulatory system for WMDs would have to measure such hidden costs, while also incorporating a host of non-numerical values. This is already the case for other types of regulation. Though economists may attempt to calculate costs for smog or agricultural runoff, or the extinction of the spotted owl, numbers can never express their value. And the same is often true of fairness and the common good in mathematical models. They’re concepts that reside only in the human mind, and they resist quantification. And since humans are in charge of making the models, they rarely go the extra mile or two to even try. It’s just considered too difficult. But we need to impose human values on these systems, even at the cost of efficiency. For example, a model might be programmed to make sure that various ethnicities or income levels are represented within groups of voters or consumers. Or it could highlight cases in which people in certain zip codes pay twice the average for certain services. These approximations may be crude, especially at first, but they’re essential. Mathematical models should be our tools, not our masters.

The achievement gap, mass incarceration, and voter apathy are big, nationwide problems that no free market nor mathematical algorithm will fix. So the first step is to get a grip on our techno-utopia, that unbounded and unwarranted hope in what algorithms and technology can accomplish. Before asking them to do better, we have to admit they can’t do everything.

To disarm WMDs, we also need to measure their impact and conduct algorithmic audits. The first step, before digging into the software code, is to carry out research. We’d begin by treating the WMD as a black box that takes in data and spits out conclusions. This person has a medium risk of committing another crime, this one has a 73 percent chance of voting Republican, this teacher ranks in the lowest decile. By studying these outputs, we could piece together the assumptions behind the model and score them for fairness.

Sometimes, it is all too clear from the get-go that certain WMDs are only primitive tools, which hammer complexity into simplicity, making it easier for managers to fire groups of people or to offer discounts to others. The value-added model used in New York public schools, for example, the one that rated Tim Clifford a disastrous 6 one year and then a high-flying 96 a year later, is a statistical farce. If you plot year-to-year scores on a chart, the dots are nearly as randomly placed as hydrogen atoms in a room. Many of the math students in those very schools could study those statistics for fifteen minutes and conclude, with confidence, that the scores measure nothing. Good teachers, after all, tend to be good one year after the next. Unlike, say, relief pitchers in baseball, they rarely have great seasons followed by disasters. (And also unlike relief pitchers, their performance resists quantitative analysis.)

There’s no fixing a backward model like the value-added model. The only solution in such a case is to ditch the unfair system. Forget, at least for the next decade or two, about building tools to measure the effectiveness of a teacher. It’s too complex to model, and the only available data are crude proxies. The model is simply not good enough yet to inform important decisions about the people we trust to teach our children. That’s a job that requires subtlety and context. Even in the age of Big Data, it remains a problem for humans to solve.

Of course, the human analysts, whether the principal or administrators, should consider lots of data, including the students’ test scores. They should incorporate positive feedback loops. These are the angelic cousins of the pernicious feedback loops we’ve come to know so well. A positive loop simply provides information to the data scientist (or to the automatic system) so that the model can be improved. In this case, it’s simply a matter of asking teachers and students alike if the evaluations make sense for them, if they understand and accept the premises behind them. If not, how could they be enhanced? Only when we have an ecosystem with positive feedback loops can we expect to improve teaching using data. Until then it’s just punitive.

It is true, as data boosters are quick to point out, that the human brain runs internal models of its own, and they’re often tinged with prejudice or self-interest. So its outputs—in this case, teacher evaluations—must also be audited for fairness. And these audits have to be carefully designed and tested by human beings, and afterward automated. In the meantime, mathematicians can get to work on devising models to help teachers measure their own effectiveness and improve.

Other audits are far more complicated. Take the criminal recidivism models that judges in many states consult before sentencing prisoners. In these cases, since the technology is fairly new, we have a before and an after. Have judges’ sentencing patterns changed since they started receiving risk analysis from the WMD? We’ll see, no doubt, that a number of the judges ran similarly troubling models in their heads long before the software arrived, punishing poor prisoners and minorities more severely than others. In some of those cases, conceivably, the software might temper their judgments. In others, not. But with enough data, patterns will become clear, allowing us to evaluate the strength and the tilt of the WMD.

If we find (as studies have already shown) that the recidivism models codify prejudice and penalize the poor, then it’s time to take a look at the inputs. In this case, they include loads of birds-of-a-feather connections. They predict an individual’s behavior on the basis of the people he knows, his job, and his credit rating—details that would be inadmissible in court. The fairness fix is to throw out that data.

But wait, many would say. Are we going to sacrifice the accuracy of the model for fairness? Do we have to dumb down our algorithms?

In some cases, yes. If we’re going to be equal before the law, or be treated equally as voters, we cannot stand for systems that drop us into different castes and treat us differently.*1

Movements toward auditing algorithms are already afoot. At Princeton, for example, researchers have launched the Web Transparency and Accountability Project. They create software robots that masquerade online as people of all stripes—rich, poor, male, female, or suffering from mental health issues. By study ing the treatment these robots receive, the academics can detect biases in automated systems from search engines to job placement sites. Similar initiatives are taking root at universities like Carnegie Mellon and MIT.

Academic support for these initiatives is crucial. After all, to police the WMDs we need people with the skills to build them. Their research tools can replicate the immense scale of the WMDs and retrieve data sets large enough to reveal the imbalances and injustice embedded in the models. They can also build crowdsourcing campaigns, so that people across society can provide details on the messaging they’re receiving from advertisers or politicians. This could illuminate the practices and strategies of microtargeting campaigns.

Not all of them would turn out to be nefarious. Following the 2012 presidential election, for example, ProPublica built what it called a Message Machine, which used crowdsourcing to reverse-engi
neer the model for the Obama campaign’s targeted political ads. Different groups, as it turned out, heard glowing remarks about the president from different celebrities, each one presumably targeted for a specific audience. This was no smoking gun. But by providing information and eliminating the mystery behind the model, the Message Machine reduced (if only by a tad) grounds for dark rumors and suspicion. That’s a good thing.

If you consider mathematical models as the engines of the digital economy—and in many ways they are—these auditors are opening the hoods, showing us how they work. This is a vital step, so that we can equip these powerful engines with steering wheels—and brakes.

Auditors face resistance, however, often from the web giants, which are the closest thing we have to information utilities. Google, for example, has prohibited researchers from creating scores of fake profiles in order to map the biases of the search engine.*2

Facebook, too. The social network’s rigorous policy to tie users to their real names severely limits the research outsiders can carry out there. The real-name policy is admirable in many ways, not least because it pushes users to be accountable for the messages they post. But Facebook also must be accountable to all of us—which means opening its platform to more data auditors.

The government, of course, has a powerful regulatory role to play, just as it did when confronted with the excesses and tragedies of the first industrial revolution. It can start by adapting and then enforcing the laws that are already on the books.

As we discussed in the chapter on credit scores, the civil rights laws referred to as the Fair Credit Reporting Act (FCRA) and the Equal Credit Opportunity Act (ECOA) were meant to ensure fairness in credit scoring. The FCRA guarantees that a consumer can see the data going into their score and correct any errors, and the ECOA prohibits linking race or gender to a person’s score.

‹ Prev Next ›