Everyday Chaos
Page 8
First, the investigators had a model of what factors count. So they knew to examine the rivets in the bulkhead, but not to bother asking the flight attendant what had been served for lunch or if she had been thinking bad thoughts. Instead, they looked for evidence based on an interrelated set of models that our aircraft flight models are embedded in: laws of physics, properties of metals, systems that record repair and flight histories of planes, and so on.
Second, the investigators fulfilled the expectations of what constitutes an explanation within that domain. An airplane crash requires forensics by a government body that delivers a highly detailed account. But if your car starts to wobble, a single phrase—“rear axle’s bent”—well might suffice. Or perhaps your local mechanic will point out that if it’s wobbling because of a manufacturer’s fault in the axle, the warranty might cover it. In that case, a quick inspection might be enough to establish the explanation. No matter the particulars, the rules for explaining a broken axle are quite different from what constitutes a satisfactory explanation of a plane crash, the fall of the Roman Empire, or why you’re not your usual sparkly self today. The philosopher Ludwig Wittgenstein would count these as different “language games” and would warn us not to think that all explanations are played the same way.
Third is a rule so obvious that we don’t even think of it as a rule: we offer explanations only when there is an element of mystery. If I take milk out of the refrigerator and ask why it’s cold, you won’t be sure how to reply because the right explanation—“It was in the fridge”—should be no mystery to me at all. If it’s cold after I have just taken it out of the oven, though, an explanation might well be called for.
Fourth, there is some reason why we want an explanation. Boeing wanted to know why JAL 123 crashed so that it could fix the problem. The relatives of the victims wanted an explanation in order to know whether they should sue. But if I’m sitting quietly next to a stranger on a bus and she taps me on the shoulder and starts telling me, unasked, about Bernoulli’s principle, which explains the role the shape of wings plays in flight, she is an odd duck who does not know the basic rules of explanation or perhaps cannot tell a bus from a plane.
Finally, we look for explanations only when we think the thing to be explained can be explained. The question, “Why did he die so young?” hardly ever is actually looking for an explanation because, for many of us, there is no answer that differs from the answer to the question, “Why did he die?”: “Because he got hit by a car.” That is neither an explanation nor any comfort. Or perhaps we reply, “The Lord works in mysterious ways,” which is not much of an explanation but may be a comfort.
In short, explaining is a social act that has social motives and is performed according to rules and norms that serve social purposes. What counts as an explanation depends on why we’re asking, and what counts as a satisfactory explanation depends on the domain we’re in: doing fundamental physics research, investigating an air crash, or trying to learn from a fallen soufflé.
We have developed the rules of these explanation games over many centuries. They are exquisitely well worked out, and we inhabit them as if they were as obvious as using a spoon to drink hot soup.
* * *
What if we wanted to use a model to explain what causes war?
What factors would we include? Economic disparities? Clashing cultural values? Freudian ideas about aggression and the death wish? The historical dominance of men? Struggles for raw power? These models lead to different predictors to look for: The assassination of a leader. Economic instability. Religious differences. Plans for world domination. In reality, any of these can start a war. Each of them has at some time done so. But each has also existed without resulting in a war.
None of these is sufficient on its own. No one particular factor is necessary. Each is a “cause” only in conjunction with many, many other factors. If we say Hitler started World War II, then we also have to ask about the conditions in Germany that let Hitler come to power, the economics and military relationships in Europe that made invading Poland seem feasible to him, the cultural attitudes toward Germany in Europe and around the world, the effect of the Treaty of Versailles that ended the prior world war, the class system that enabled the calling up of soldiers in Germany and elsewhere, the historic relationship of Germany and the Sudetenland, the effect of Neville Chamberlain’s personality on the policy of appeasement, the attitude toward war shaped by the art and entertainment of the time, the history of the Jews, the history of the Poles, the history of the French, the history of everyone.…
In short, there may be no set of factors common to the start of all wars. Even if there were, in each case the factors may have different weights in their interrelationships with all the other factors.
This would be a tricky model for us to build using our old assumptions. But this is the sort of world that deep learning assumes. While the old models continue to shape our decisions about what data we think is relevant enough to feed into a deep learning system, the shape of the new models doesn’t much resemble the traditional models we’ve looked at in this chapter. A deep learning system as a working model can behave differently as well, producing more finely tuned results, and results that are more subject to cascades due to small differences, the way in the real world one stray bullet, a can of spoiled rations, or a squad mate with a contagious disease in World War I could have meant that young Lance Corporal Hitler would not have survived to lead Germany into a worldwide cataclysm.
We sometimes think about our own lives this way: the time we got out at the wrong bus stop and met the love of our life, or missed a job interview at a company that later became fabulously successful or infamously awful. But those are exceptional moments, which is why we recount them as stories. Far more commonly, we look for explanations and answers that bring a situation down to what we can manage. That’s what’s normal about the normal.
We know the world is complex, but we desperately want it to be simple enough for us to understand and to manage. Deep learning doesn’t much suffer from this tension. Complexity wins. But the tension is very much front and center as we humans try to come to grips with deploying deep learning systems, for these systems don’t play by the rules of our traditional language games for explanations. Policies such as the European Union’s requirement that AI be capable of explaining its processes when its conclusions significantly affect us demand more explicability than we generally ask of nondigital systems. We don’t expect to be able to explain an axle failure with anything much beyond “We must have hit a bump” or “It was a faulty axle,” but we may require autonomous vehicles to be able to explain every lane change in a ballet of traffic choreographed on the fly by the ad hoc collaborative of networked vehicles on the road at that moment.
It’s possible that this demand on AI for explanations has been so well and widely received because most people’s expectations of this new realm of digital technology have been shaped by traditional computers, which are little controlled worlds. A traditional computer can tell us about all the data it’s dealing with, and the computer does nothing to that data that a human didn’t program it to do. But a deep learning program that has constructed its model out of the data we’ve given it can’t always be interrogated about its “decisions.” While there are still elements humans control—which data is put in, how that data is preprocessed, how the systems are tuned, and so forth—deep learning may not meet the first requirement of explanations: an intelligible model.
Yet many who want AI’s outputs to be as explicable as plane crashes do understand how deep learning works. They want explicability for a truly basic reason: to prevent AI from making our biased culture and systems even worse than they were before AI. Instances of algorithmic unfairness are well documented and appalling, from racist bail risk assessment algorithms to AI that, when translating from languages without gendered pronouns, automatically refers in English to a nurse as “she” and a programmer as “he.”58
/> Keeping AI from repeating, amplifying, and enforcing existing prejudices is a huge and hugely important challenge. It is a sign of hope that algorithmic unfairness has become such a well-recognized issue and has engaged many of our best minds. And if for this book the question is, “How is our engagement with our new technology changing our ideas about how things happen?” then perhaps the first thing we should learn is that the very difficulty of removing (or sometimes even mitigating) the biases in our data makes it clear—in case anyone had any doubt—that things happen unfairly.
Our insistence on explanations makes two more things clear.
First, we have thought, in an odd way, that an explanation is a readout of the state of the world. But the argument over requiring AI’s explicability brings us face to face with the fact that explanations are not readouts but tools. We use explanations to fix our planes, to determine whether our axle is under warranty, to decide whether we should stop eating banana smoothies, or to be reassured that our race did not affect the severity of our jail sentence. There are certainly situations in which we’ll want to confine our AI to drawing conclusions that we can understand, as Bitvore and FICO already do. This is clearly appealing for the use of AI by institutions such as the courts, where trust in the system is paramount. We are going to have to work out together exactly where we think the trade-offs are, and it will be a messy, difficult process.
But no matter how we work this out, differently in each domain, we should also recognize that the demand for explicable AI is only a question at all because the inexplicable “black box” AI systems we’re developing work. If they didn’t do what we intended more accurately, faster, or both, then we’d just stop using them. We are thereby teaching ourselves, over and over, that systems that surpass our ability to diagnose and predict may also surpass our understanding.
The unexpressed conclusion that we are leading ourselves to is that they’re better at this because their model of the world is a more accurate, more useful, and often more true representation of how things happen (“often” because it is mathematically possible for models that yield the most accurate predictions to be based on some elements of falsehood).59 Even so, the demand for explanations may therefore be leading us to recognize that the inexplicability of deep learning’s models comes straight from the world itself.
Coda: Optimization over Explanation
During the oil crisis of the 1970s, the US federal government decided to optimize highways for better mileage by dropping the speed limit to fifty-five miles per hour,60 trading shorter travel times for greater fuel efficiency. We could similarly decide to regulate what driverless cars (more accurately, autonomous vehicles [AVs]) are optimized for as a way of achieving the results we want, without insisting that the machine learning systems literally driving these cars always be explicable. After all, if explanations are a tool, we should be asking the questions we implicitly ask before using any tool: Will it do the job? How well? What are the trade-offs? What other tools are available? Given our overall aims—in this case, our social goals—is this the best tool for the job? What’s true of dutch ovens is also true of explanations.
Let’s say we decide the system of AVs should be optimized for lowering the number of US traffic fatalities from forty thousand per year. If the number of fatalities indeed drops dramatically—McKinsey and Elon Musk’s Tesla blog both imagine it could be reduced by 90 percent—then the system has reached its optimization goal, and we’ll rejoice, even if we cannot understand why any particular vehicle made the “decisions” it made;61 as we’ve noted, the behavior of AVs may well become quite inexplicable particularly as AVs on the road are networked with one another and decide on their behavior collaboratively.
Of course, regulating AV optimizations will be more complex than just checking that there are fewer fatalities, for we’re likely to say that AVs ought to be optimized also for reducing injuries, then for reducing their environmental impact, then for shortening drive time, then for comfortable rides, and so forth. The exact hierarchy of priorities is something we will have to grapple with, preferably as citizens rather than leaving it to the AV manufacturers, for these are issues that affect the public interest and ought to be decided in the public sphere of governance.
Not that this will be easy. Deciding on what we want the system of AVs optimized for is going to raise the sorts of issues that have long bedeviled us. For example, suppose it turns out that allowing trucks to go 200 miles per hour marginally increases the number of fatalities but brings an economic boom that employs more people and drives down child poverty rates? How many lives would we be willing to sacrifice? Or, as Brett Frischmann and Evan Selinger ask in Re-engineering Humanity, might we want to optimize traffic so that people with more urgent needs get priority? If so, do we give priority to the woman on the way to an important business meeting or to the woman on the way to her child’s soccer game?62
Lest these examples seem too remote, consider the explanation of why in March 2018 one of Uber’s experimental AVs hit and killed a pedestrian in Arizona.63 The National Transportation Safety Board’s initial report said that the AV detected the person as much as six seconds earlier but didn’t stop or even slow down because its emergency braking system had been purposefully disabled. Why? Uber said it was done “to reduce the potential for erratic vehicle behavior.”64 Turning off the emergency braking system on an AV traveling on public roads seems on the face of it to be plainly irresponsible. But it may be related to a known and literally uncomfortable trade-off between safety and a smoother ride for passengers.
That trade-off seems baked into the intersection of machine learning systems and physics. AVs use lidar—light-based radar—to constantly scan the area around them. Everything the lidar reveals is evaluated by the AV’s computer as a possible cause of action. These evaluations come with a degree of confidence, for that is how machine learning systems, as statistical engines, work. So, what’s the degree of confidence that an object might be a pedestrian that should get the AV to put on its brakes? Fifty percent? Sure. Why not five percent? Quite possibly. So why not insist that AVs be required to brake if there is even a .01 percent possibility that an object is a pedestrian?
The answer is the same as for why we’re not going to prevent AVs from ever going over fifteen miles per hour on the highway, even though that restriction would lower fatalities. As the confidence levels we require go down, the vehicle is going to be slamming on its emergency brakes more and more frequently. At some point, passengers will be treated to a ride on a bucking bronco from which they will emerge shaken, late, and determined never to ride in another AV. So if we are to deploy AVs to gain the important societal benefits they can bring, then we are going to have to face a trade-off between passenger comfort and safety.
Deciding on what we want these systems optimized for is obviously going to require some difficult decisions. But we make these sorts of decisions all the time. Police departments decide whether they’re going to ticket jaywalkers to reduce traffic accidents and injuries at the cost of pedestrian convenience. Cities decide whether to create bicycle lanes even if it means slowing motorized traffic. Zoning laws are all about trade-offs, as are decisions about budgets, school curricula, and whether to shut down Main Street for the local sports team’s victory parade. All decisions are trade-offs—that’s what makes them decisions. AVs and other machine learning systems are going to force us to be more explicit and more precise in many of those decisions. Is that really a bad thing?
These conversations and, yes, arguments are ones we need to have. Insisting that AI systems be explicable sounds great, but it distracts us from the harder and far more important question: What exactly do we want from these systems?
In many if not most cases we should insist that AI systems make public the hierarchy of optimizations they’re aimed at, even if those systems are not subject to public regulation. Is the navigation system you use optimized for fuel efficiency, for getting you where you’re going i
n the shortest time, for balancing the traffic loads throughout the system, or for some combination? Does your social networking app aim at keeping you deeply involved with a small circle of your closest friends, reminding you of people drifting out of orbit, or introducing you to new possible friends? What’s its hierarchy of goals? Maybe users might even be given a say about that.
When systems are transparent about their goals, we can then insist that they be transparent about how well they’re achieving those goals. If they’re not living up to goals we’ve socially decided on, we can hold their creators and managers accountable, just as we hold automobile manufacturers responsible if their cars fail to meet emission standards. We can employ the usual incentives, including legal action, to get what we want and need—although sometimes this will mean learning that we were unrealistic in our expectations.
Notice that none of this necessarily requires us to demand that the technology be fully explicable.
* * *
But achieving the goals for which we’ve optimized our machine learning systems is not enough.
Suppose we agree that we want our system of AVs to dramatically reduce traffic fatalities. And suppose when we put the system in place, fatalities drop from forty thousand to five thousand per year. But now suppose that after a month or two (or, preferably, in simulations even before the system is deployed) it becomes apparent that poor people make up a wildly disproportionate number of the victims. Or suppose an AI system that culls job applicants picks a set of people worth interviewing, but only a tiny percentage of them are people of color. Achieving optimizations is clearly not enough. We also need to constrain these systems to support our fundamental values. Systems need to be extensively tested and incrementally deployed with this in mind not as an optimization but as a baseline requirement.
Even achieving this fundamental type of fairness does not necessarily require the sort of explicability so often assumed to be essential. For example, since old biases are often inadvertently smuggled into AI systems in the data that those systems are trained on, transparency of data—not explicability of operations—often will be the best recourse: How was the data collected? Is it representative? Has it been effectively cleansed of irrelevant data about race, gender, and so forth, including hidden proxies for those attributes? Is it up to date? Does it account for local particularities? Answering these questions can be crucial for assessing and adjusting a machine learning system. Answering them does not necessarily require understanding exactly how the model created from that data works.