Everyday Chaos
Page 7
Getting explanations from a machine learning system is much easier when humans have programmed in the features the system should be looking for. For example, the Irvine, California–based company Bitvore analyzes news feeds and public filings to provide real-time notifications to clients about developments relevant to them. To do this, its dozens of algorithms have been trained to look for over three hundred different sorts of events, including CEO resignations, bankruptcies, lawsuits, and criminal behavior, all of which might have financial impacts. Jeff Curie, Bitvore’s president, says that it’s like having several hundred subject experts each scouring a vast stream of data.43 When one of these robotic experts finds something relevant to its area of expertise, it flags it, tags it, and passes it on to the rest, who add what they know and connect it to other events. This provides clients—including intelligence agencies and financial houses—not just an early warning system that sounds an alarm but also contextualized information about the alarm.
Bitvore’s system is designed so that its conclusions will always be explicable to clients. The company’s chief technology officer, Greg Bolcer, told me about a time when the system flagged news about cash reserves as relevant to its municipal government clients. It seemed off, so Bolcer investigated. The system reported that the event concerned not cash reserves but a vineyard’s “special reserve” wines and was of no relevance to Bitvore’s clients. To avoid that sort of machine-based confusion, Bitvore’s system is architected so that humans can always demand an explanation.44
Bitvore is far from the only system that keeps its results explicable. Andrew Jennings, the senior vice president of scores and analytics at FICO, the credit-scoring company, told me, “There are a number of long standing rules and regulations around credit scoring in the US and elsewhere as a result of legislation that require[s] people who build credit scores to manage the tradeoff between things that are predictively useful and legally permitted.”45 Machine learning algorithms might discover—to use a made-up example—that the Amish generally are good credit risks but, say, Episcopalians are not. Even if this example were true, that knowledge could not be used in computing a credit score because US law prevents discrimination on the basis of religion or other protected classes. Credit score companies are also prohibited from using data that is a surrogate for these attributes, such as an applicant’s subscribing to Amish Week magazine or, possibly, the size of someone’s monthly electricity bills.
There are additional constraints on the model that credit score companies can use to calculate credit risk. If a lender declines a loan application, the lender has to provide the reasons why the applicant’s score was not higher. Those reasons have to be addressable by the consumer. For example, Jennings explained, an applicant might be told, “Your score was low because you’ve been late paying off your credit cards eight times in the past year,” a factor that the applicant can improve in the future.
But suppose FICO’s manually created models turn out to be less predictive of credit risk than a machine learning system would be? Jennings says that they have tested this and found the differences between the manual and machine learning models to be insignificant. But the promise of machine learning is that there are times when the machine’s inscrutable models may be more accurately predictive than manually constructed, human-intelligible ones.
As such systems become more common, the demand for keeping their results understandable is growing. It’s easy to imagine a patient wanting to know why some future version of Deep Patient has recommended that she stop eating high-fat foods, or that she preemptively get a hysterectomy. Or a job applicant might want to know whether her race had anything to do with her being ruled out of the pool of people to interview. Or a property owner might want to know why a network of autonomous automobiles sent one of its cars through her fence as part of what that network thought was the optimal response to a power line falling onto a highway. Sometimes these systems will be able to report on what factors weighed the heaviest in a decision, but sometimes the answer will consist of the weightings of thousands of factors, with no one factor being dominant. These systems are likely to become more inexplicable as the models become more complex and as the models incorporate outputs from other machine learning systems.
But it’s controversial. As it stands, in most fields developers generally implement these systems aiming at predictive accuracy, free of the requirement to keep them explicable. While there is a strong contingent of computer scientists who think that we will always be able to wring explanations out of machine learning systems, what counts as an explanation, and what counts as understanding, is itself debatable.46 For example, the counterfactual approach proposed by Sandra Wachter, Brent Mittelstadt, and Chris Russell at Oxford could discover whether, say, race was involved in why someone was put into the “do not insure” bin by a machine learning application: in the simplest case, resubmit the same application with only the race changed, and if the outcome changes, then you’ve shown race affected the outcome.47 It does not at all take away from the usefulness of the counterfactual approach to point out that it produces a very focused and minimal sense of “explanation,” and even less so of “understanding.”
In any case, in many instances, we’ll accept the suggestions of these systems if their performance records are good, just as we’ll accept our physician’s advice if she can back it up with a study we can’t understand that shows that a treatment is effective in a high percentage of cases—and just as many of us already accept navigation advice from the machine learning–based apps on our phones without knowing how those apps come up with their routes. The riskier or more inconvenient the medical treatment, the higher the probability of success we’ll demand, but the justification will be roughly the same: a good percentage of people who follow this advice do well. That’s why we took aspirin—initially in the form of willow bark—for thousands of years before we understood why it works.
As machine learning surpasses the predictive accuracy of old-style models, and especially as we butt our heads against the wall of inexplicability, we are coming to accept a new model of models, one that reflects a new sense of how things happen.
Four New Ways of Happening
Suppose someday in the near future your physician tells you to cut down on your potassium intake; no more banana smoothies for you. When you ask why, she replies that Deep Asclepius—a deep learning system I’ve made up—says you fit the profile of people who are 40 percent more likely to develop Parkinson’s disease at some point in their lives if they take in too much potassium (which I’m also making up).
“What’s that profile?” you may ask.
Your physician explains: “Deep Asclepius looks at over one thousand pieces of data for each person, and Parkinson’s is a complex disease. We just don’t know why those variables combine to suggest that you are at risk.”
Perhaps you’ll accept your physician’s advice without asking about her reasons, just as you tend to accept it when your physician cites studies you’re never going to look up and couldn’t understand if you did. In fact, Deep Asclepius’s marketers will probably forestall the previous conversation by turning the inexplicability of its results into a positive point: “Medical treatment that’s as unique as you are … and just as surprising!”
Casual interactions such as these will challenge the basic assumptions of our past few thousand years of creating models.
First, we used to assume that we humans made the models: in many cases (but not all, as we’ve seen) we came up with the simplified conceptual model first, and then we made a working model. But deep learning’s models are not created by humans, at least not directly.48 Humans choose the data and feed it in, humans head the system toward a goal, and humans can intercede to tune the weights and the outcomes. But humans do not necessarily tell the machine what features to look for. For example, Google fed photos that included dumbbells into a machine learning system to see if it could pick out the dumbbells from everything else i
n the scene. The researchers didn’t give the system any characteristics of dumbbells to look for, such as two disks connected by a rod. Yet without being told, the system correctly abstracted an image of two disks connected by a bar. On the other hand, the image also included a muscular arm holding the dumbbell, reflecting the content of the photos in the training set.49 (We’ll talk in the final chapter about whether that was actually a mistake.)
Because the models deep learning may come up with are not based on the models we have constructed for ourselves, they can be opaque to us. This does not mean, however, that deep learning systems escape human biases. As has become well known, they can reflect and even amplify the biases in the data itself. If women are not getting hired for jobs in tech, a deep learning system trained on existing data is likely to “learn” that women are not good at tech. If black men in America are receiving stiffer jail sentences than white men in similar circumstances, the training based on that data is very likely to perpetuate that bias.50
This is not a small problem easily solved. Crucially, it is now the subject of much attention, research, and development.
The second assumption about models now being challenged comes from the fact that our conceptual models cover more than one case; that’s what makes them models. We have therefore tended to construct them out of general principles or rules: Newton’s laws determine the paths of comets, lowering prices tends to increase sales, and all heavenly bodies move in circles, at least according to the ancient Greeks. Principles find simpler regularities that explain more complex particulars. But deep learning models are not generated premised on simplified principles, and there’s no reason to think they are always going to produce them, just as A/B testing may not come up with any generalizable rules for how to make ads effective.
Sometimes a principle or at least a rule of thumb does emerge from a deep learning system. For example, in a famous go match between Lee Sedol, a world-class master, and Google’s AlphaGo, the computer initially played aggressively. But once AlphaGo had taken over the left side of the board, it started to play far more cautiously. This turned out to be part of a pattern: when AlphaGo is 70 percent confident it’s going to win, it plays less aggressively. Perhaps this is a generalizable heuristic for human go players as well.51 Indeed, in 2017, Google launched a program that brings together human players and AlphaGo so that the humans can learn from the machine.52
A later version of AlphaGo took the next step. Rather than training AlphaGo on human games of go, the programmers fed in nothing but the rules of the game and then had the machine play itself. After just three days, the system so mastered the game that it was able to beat the prior version of AlphaGo a hundred games out of a hundred.53 When experts studied the machine-vs.-machine games that Google published, some referred to the style of play as “alien.”54
Isn’t that the literal truth?
If so, it’s because of the third difference: deep learning systems do not have to simplify the world to what humans can understand.
When we humans build models for ourselves, we like to find the general principles that govern the domain we’re modeling. Then we can plug in the specifics of some instance and read out the date and time of an eclipse or whether the patient has type 2 diabetes. Deep learning systems typically put their data through artificial neural networks to identify the factors (or “dimensions”) that matter and to discern their interrelationships. They typically do this several times, sometimes making the relationships among the pieces understandable only by understanding the prior pass, which may have surpassed our understanding on its own.
The same holds for the data we input in order to get, say, a diagnosis from my hypothetical Deep Asclepius system. Deep Asclepius doesn’t have to confine itself to the handful of factors a patient is typically asked to list on a three-page form while sitting in the waiting room. It can run the patient’s lifetime medical record against its model, eventually even pulling in, perhaps, environmental data, travel history, and education records, noting relationships that might otherwise have been missed (and assuming privacy has been waived). Simplification is no longer required to create a useful working model.
The success of deep learning suggests to us that the world does not separate into neatly divided events that can be predicted by consulting a relative handful of eternal laws. The comet crossing paths with Jupiter, Saturn, and the sun is not a three-body or four-body problem but rather an all-body problem, for, as Newton well knew, every gravitational mass exerts some pull on every other. Calculating a comet’s path by computing the gravitational effect of three massive bodies is a convenient approximation that hides the alien complexity of the truth.
As we gasp at what our machines can now do, we are also gasping at the clear proof of what we have long known but often suppressed: our old, oversimplified models were nothing more than the rough guess of a couple of pounds of brains trying to understand a realm in which everything is connected to, and influenced by, everything.
Fourth, where we used to assume that our conceptual models were stable if not immutable, everything being connected to everything means that machine learning’s model can constantly change. Because most of our old models were based on stable principles or laws, they were slower to change. The classic paradigm for this was put forward by Thomas Kuhn in his 1962 book The Structure of Scientific Revolutions. Historically, Kuhn says, a science’s overarching model (which he calls a paradigm) maintains itself as data piles up that doesn’t fit very well.55 At some point—it’s a nonlinear system—a new paradigm emerges that fits the anomalous data, as when germ theory replaced the long-held idea that diseases such as malaria were caused by bad air. But changes in machine learning models can occur simply by retraining them on new data. Indeed, some systems learn continuously. For example, our car navigation systems base our routes on real-time information about traffic and can learn from that data that Route 128 tends to get backed up around four o’clock in the afternoon. This can create a feedback loop as the navigation system directs people away from Route 128 at that time, perhaps reducing the backups. These feedback loops let the model constantly adjust itself to changing conditions and optimize itself further.
As we’ll see, this reveals a weakness in our traditional basic strategy for managing what will happen, for the elements of a machine learning model may not have the sort of one-to-one relationship that we envision when we search for the right “lever” to pull. When everything affects everything else, and when some of those relationships are complex and nonlinear—that is, tiny changes can dramatically change the course of events—butterflies can be as important as levers.
Overall, these changes mean that while models have been the stable frameworks that enable explanation, now we often explain something by trying to figure out the model our machines have created.
The only real continuity between our old types of models and our new ones is that both are representations of the world. But one is a representation that we have created based on our understanding, a process that works by reducing the complexity of what it encounters. The other is generated by a machine we have created, and into which we have streamed oceans of data about everything we have thought might possibly be worth noticing. The source, content, structure, and scale of these two types of representations are vastly, disconcertingly different.
Explanation Games
“JAL 123 was twelve minutes into its flight when a bang was heard on the flight deck.”
On August 12, 1985, thirty-two minutes after that, the pilots lost their struggle to keep the plane aloft as its right wingtip clipped a mountain. The Boeing 747 came down with such force that three thousand trees in its path were destroyed. Of its 509 passengers, 505 were killed. It is to this day the plane crash that claimed the most victims.56
The task facing the investigators who arrived from multiple organizations and countries was made more difficult by the impending nightfall, which prevented immediate access, and by the mountainous terrain
—inaccessible by helicopter—across which pieces of the plane were strewn. But once the one surviving flight attendant reported that she had seen the sky through the aft part of the plane after a tremendous explosion, the investigators knew where to look: the pressure bulkhead that sealed the rear of the plane.
To these skilled forensic experts, the nature of the tear marks indicated metal fatigue. They checked the aircraft’s repair history and found that seven years earlier it had struck its tail while landing, necessitating a repair. This led them to a hypothesis that they confirmed by inspecting the pattern of rivets used to repair the bulkhead. Where there should have been three rows of rivets, there was only one. “Instead of replacing the whole bulkhead, Boeing had merely replaced half of it.”57 This put extra stress on the single line of rivets. Each takeoff and landing stressed it a bit more. A back-of-the-envelope calculation told the investigators that the plane was within 5 percent of the number of takeoffs and landings likely to break the seam open.
From there the sequence of events was relatively straightforward to read. The bulkhead blew, knocking out the hydraulics to the tail, catastrophically limiting the pilots’ ability to control the flight.
This story has all the classic elements of what we mean by explaining something: