Black Box Thinking
Page 4
Today many prestige airlines have gone even further, creating the real-time monitoring of tens of thousands of parameters, such as altitude deviation and excessive banking, allowing continuous comparison of performance to diagnose patterns of concern. According to the Royal Aeronautical Society: “It is the most important way to dramatically improve flight safety.”2 The current ambition is to increase the quantity of real-time data so as to render the black boxes redundant. All the information will already have been transmitted to a central database.
Aviation, then, takes failure seriously. Any data that might demonstrate that procedures are defective, or that the design of the cockpit is inadequate, or that the pilots haven’t been trained properly, is carefully extracted. These are used to lock the industry onto a safer path. And individuals are not intimidated about admitting to errors because they recognize their value.
II
What did all this mean for United Airlines 173? Within minutes of the crash an investigation team was appointed by the National Transportation Safety Board, including Alan Diehl, a psychologist, and Dennis Grossi, an experienced investigator. By the following morning they had arrived in suburban Portland to go over the evidence with a fine-tooth comb.
It is a testament to the extraordinary skill of McBroom that he kept the plane under control for as long as he did. As the aircraft was dropping he noticed an area amid the houses and apartment blocks that looked like an open space, possibly a field, and steered toward it. As he got closer, he realized that it was, in fact, a wooded suburb. He tried to steer between the trees, collided with one, plowed through a house, and came to rest on top of another house across the street.
The first house was obliterated. Pieces of the aircraft’s left wing were later found in another part of the suburb. The lower left side of the fuselage, between the fourth and sixth rows of passenger seats and below window level, was completely torn away. Miraculously, there were no fatalities on the ground; eight passengers and two crew members died. One of them was Flight Engineer Mendenhall, who had vainly attempted to warn the pilot of the dwindling fuel reserves. McBroom, the captain, survived with a broken leg, shoulder, and ribs.
As the investigators probed the evidence of United Airlines 173, they could see a pattern. It was not just what they discovered amid the wreckage in Portland, it was the comparison with previous accidents. One year earlier another DC8 crashed in almost identical circumstances. The plane, destined for San Francisco from Chicago, had entered a holding pattern at night because of a problem with the landing gear, flew around trying to fix it, and then flew into a mountain, killing everyone on board.3
A few years earlier, Eastern Airlines 401 suffered a similar fate as it was coming in to land at Miami International Airport. One of the lights in the cockpit had not illuminated, causing the crew to fear that the landing gear had failed to lower into place. As the crew focused on troubleshooting the problem (it turned out to be a faulty bulb), they failed to realize that the plane was losing altitude, despite warnings from the safety systems. It crashed into the Everglades, killing 101 people.4
In each case the investigators realized that crews were losing their perception of time. Attention, it turns out, is a scarce resource: if you focus on one thing, you will lose awareness of other things.
This can be seen in an experiment where students were given a series of tasks. One task was easy: reading out loud. Another task was trickier: defining difficult words. After they had completed the tasks, the students were asked to estimate how much time had passed. Those with the easy task gave accurate estimates; those with the tough task underestimated the time by as much as 40 percent. Time had flown by.
Now think of McBroom. He didn’t just have to focus on difficult words. He had to troubleshoot a landing gear problem, listen to his co-pilots, and anticipate landing under emergency conditions. Think back, too, to the doctors surrounding Elaine Bromiley. They were absorbed in trying to intubate, frantically trying to save the life of their patient. They lost track of time not because they didn’t have enough focus, but because they had too much focus.* Back in Portland, Oregon, Diehl realized that another fundamental problem involved communication. Engineer Mendenhall had spotted the fuel problem. He had given a number of hints to the captain and, as the situation became serious, made direct references to the dwindling reserves. Diehl, listening back to the voice recorder, noted alterations in the intonation of the engineer. As the dangers spiraled he became ever more desperate to alert McBroom, but he couldn’t bring himself to challenge his boss directly.
This is now a well-studied aspect of psychology. Social hierarchies inhibit assertiveness. We talk to those in authority in what is called “mitigated language.” You wouldn’t say to your boss: “It’s imperative we have a meeting on Monday morning.” But you might say: “Don’t worry if you’re busy, but it might be helpful if you could spare half an hour on Monday.”5 This deference makes sense in many situations, but it can be fatal when a 90-ton airplane is running out of fuel above a major city.
The same hierarchy gradient also exists in operating theaters. Jane, the nurse, could see the solution. She had fetched the tracheotomy kit. Should she have spoken up more loudly? Didn’t she care enough? That is precisely the wrong way to think about failure in safety-critical situations. Remember that Engineer Mendenhall paid for his reticence with his life. The problem was not a lack of diligence or motivation, but a system insensitive to the limitations of human psychology.
Now let us compare the first- and third-person perspectives. For the doctors at the hospital near North Marston, the accident may indeed have seemed like a “one-off.” After all, they didn’t know that they had spent eight long minutes in a vain attempt at intubation. To them, they had been trying for a fraction of that time. Their subjective sense of time had all but vanished in the panic. The problem, in their minds, was with the patient. She had died far quicker than they could have possibility anticipated. In the absence of an investigation, how could they have known any better?
An almost identical story can be told of United Airlines 173. When Alan Diehl, the investigator, went to the hospital in Oregon to interview McBroom a few days after the crash, the pilot informed him that the fuel reserves had depleted “incredibly quickly.” He offered the possibility that there had been a leak in the tanks. From his perspective, with his awareness of time obliterated by the growing crisis, this was a rational observation. To him, the fuel running out just didn’t make sense.
But Diehl and his team took the trouble to double-check the black box data. They looked at the reserves at the time of the decision to go into a holding pattern, checked how fast DC8s deplete fuel on average, then looked at when the fuel actually ran out. They correlated perfectly. The plane had not run out of fuel any quicker than expected. The leak was not in the tank, but in McBroom’s sense of time.
Only through an investigation, from an independent perspective, did this truth come to light. In health care nobody recognized the underlying problem because, from a first-person perspective, it didn’t exist. That is one of the ways that closed loops perpetuate: when people don’t interrogate errors, they sometimes don’t even know they have made one (even if they suspect they may have).
When Diehl and his colleagues published the report on United Airlines 173 in June 1979, it proved to be a landmark in aviation. On the thirtieth page, in the dry language familiar in such reports, it offered the following recommendation: “Issue an operations bulletin to all air carrier operations inspectors directing them to urge their assigned operators to insure that their flight crews are indoctrinated in principles of flightdeck resource management, with particular emphasis on the merits of participative management for captains and assertiveness training for other cockpit crewmembers.”
Within weeks, NASA had convened a conference to explore the benefit of a new kind of training: Crew Resource Management. The primary focus was on communication.
First officers were taught assertiveness procedures. The mnemonic that has been used to improve the assertiveness of junior members of the crew in aviation is called P.A.C.E. (Probe, Alert, Challenge, Emergency).* Captains, who for years had been regarded as big chiefs, were taught to listen, acknowledge instructions, and clarify ambiguity. The time perception problem was tackled through a more structured division of responsibilities.
Checklists, already in operation, were expanded and improved. The checklists have been established as a means of preventing oversights in the face of complexity. But they also flatten the hierarchy. When pilots and co-pilots talk to each other, introduce themselves, and go over the checklist, they open channels of communication. It makes it more likely the junior partner will speak up in an emergency. This solves the so-called activation problem.
Various versions of the new training methods were immediately trialed in simulators. At each stage, the new ideas were challenged, rigorously tested, and examined at their limits. The most effective proposals were then rapidly integrated into airlines around the world. After a terrible set of accidents in the 1970s, the rate of crashes began to decline.
“United Airlines 173 was a traumatic incident, but it was also a great leap forward,” the aviation safety expert Shawn Pruchnicki says. “It is still regarded as a watershed, the moment when we grasped the fact that ‘human errors’ often emerge from poorly designed systems. It changed the way the industry thinks.”
Ten people died on United Airlines 173, but the learning opportunity saved many thousands more.
• • •
This, then, is what we might call “black box thinking.”* For organizations beyond aviation, it is not about creating a literal black box; rather, it is about the willingness and tenacity to investigate the lessons that often exist when we fail, but which we rarely exploit. It is about creating systems and cultures that enable organizations to learn from errors, rather than being threatened by them.
Failure is rich in learning opportunities for a simple reason: in many of its guises, it represents a violation of expectation.6 It is showing us that the world is in some sense different from the way we imagined it to be. The death of Elaine Bromiley, for example, revealed that operating procedures were insensitive to limitations of human psychology. The failure of United Airlines 173 revealed similar problems in cockpits.
These failures are inevitable because the world is complex and we will never fully understand its subtleties. The model, as social scientists often remind us, is not the system. Failure is thus a signpost. It reveals a feature of our world we hadn’t grasped fully and offers vital clues about how to update our models, strategies, and behaviors. From this perspective, the question often asked in the aftermath of an adverse event, namely, “Can we afford the time to investigate failure?,” seems the wrong way around. The real question is, “Can we afford not to?”
This leads to another important conclusion. It is sometimes said that the crucial difference between aviation and health care is available resources: because aviation has more money at its disposal, it is able to conduct investigations and learn from mistakes. If health care had more resources, wouldn’t it do the same? However, we can now see that this is profoundly wrongheaded. Health care may indeed be under-resourced, but it would save money by learning from mistakes. The cost of medical error has been conservatively estimated at more than $17 billion in the United States alone.7 As of March 2015 the NHS Litigation Authority in the UK had set aside £26.1 billion to cover outstanding negligence liabilities. Learning from mistakes is not a drain on resources; it is the most effective way of safeguarding resources—and lives.*
Psychologists often make a distinction between mistakes where we already know the right answer and mistakes where we don’t. A medication error, for example, is a mistake of the former kind: the nurse knew she should have administered Medicine A but inadvertently administered Medicine B, perhaps because of confusing labeling combined with pressure of time.
But sometimes mistakes are consciously made as part of a process of discovery. Drug companies test lots of different combinations of chemicals to see which have efficacy and which don’t. Nobody knows in advance which will work and which won’t, but this is precisely why they test extensively, and fail often. It is integral to progress.
On the whole, we will be looking at the first type of failure in the early part of this book and the second type in the latter part. But the crucial point is that in both scenarios, error is indispensable to the process of discovery. In industries like health care, errors provide signposts about how to reform the system to make future errors less likely; in the latter case, errors drive the discovery of new medicines.
A somewhat overlapping distinction can be made between errors that occur in a practice environment and those that occur in a performance environment. Figure skaters, for example, fall a lot in training. By stretching themselves, attempting difficult jumps, and occasionally falling onto the cold ice, they progress to more difficult jumps, improving judgment and accuracy along the way. This is what enables them to perform so flawlessly when they arrive at a big competition.
In effect, practice is about harnessing the benefits of learning from failure while reducing its cost. It is better to fail in practice in preparation for the big stage than on the big stage itself. This is true of organizations, too, that conduct pilot schemes (and in the case of aviation and other safety-critical industries test ideas in simulators) in order to learn, before rolling out new ideas or procedures. The more we can fail in practice, the more we can learn, enabling us to succeed when it really matters.
But even if we practice diligently, we will still endure real-world failure from time to time. And it is often in these circumstances, when failure is most threatening to our ego, that we need to learn most of all. Practice is not a substitute for learning from real-world failure; it is complementary to it. They are, in many ways, two sides of the same coin.
With this in mind, let us take one final example of a “black-box-style investigation.” It involved the losses of bomber aircraft during World War II and was conducted by one of the most brilliant mathematicians of the twentieth century: Abraham Wald.
His analysis was not just a pivotal moment in a major conflict, but also an important example within the context of this book. Learning from adverse events can sometimes look easy with the benefit of hindsight. Weren’t the lessons from United Airlines 173, for example, just obvious? Didn’t they jump out of the data?
At the time of the investigation, however, the data can often seem far more ambiguous. The most successful investigators reveal not just a willingness to engage with the incident, but also have the analytical skills and creative insights to extract the key lessons. Indeed, many aviation experts cite the improvement in the quality and sophistication of investigations as one of the most powerful spurs to safety in recent years.8
But few investigations have been as ingenious as that conducted by Wald. His work was classified for decades, but the full story, and how it contributed to the defeat of Nazism, has recently been told. Most of all his investigations reveal that in order to learn from failure, you have to take into account not merely the data you can see, but also the data you can’t.
III
Abraham Wald was born in Hungary in 1902 to a Jewish baker. He was educated at home by his older brother, Martin, who was a qualified engineer. Early on he developed a love for mathematics and, at the age of fourteen, geometry. According to those who knew him, little Abraham was always creating and solving puzzles.
Wald left home in 1927 to study at the University of Vienna. He had a quizzical face, dark hair, and bright eyes and his sharp mind was instantly recognized by his teachers and fellow students. As one colleague put it: “I was captivated by his great ability, his gentleness and the extraordinary strength with which he attacked his problems.”9
While at the university Wald was invit
ed by Karl Menger, one of the greatest mathematicians of his generation, to join the Colloquium, a group of scholars who would meet informally to discuss math and philosophy, and which included names that would later become legendary, such as Kurt Gödel and Alfred Tarski. Wald continued to flourish, writing a series of papers on geometry that Menger described as “deep, beautiful and of fundamental importance.”10
But Wald was not able to gain a teaching post in Vienna: his Jewish background made it politically impossible. “At that time of economic and incipient political unrest, it was out of the question for him to secure a position at the University of Vienna, although such a connection would certainly have been as profitable for the institution as for himself,” Menger would later write. “With his characteristic modesty, Wald told me that he would be perfectly satisfied with any small private position that would enable him to continue his work with the Mathematical Colloquium.”11
But even this minor role would prove problematic as Europe headed toward war. In 1937, the presence of Wald within the Mathematical Colloquium was criticized by Nazi sympathizers. A year later, when Hitler marched into the Austrian capital, Wald was sacked. He remained for a few weeks after the occupation, but as the Nazis ratcheted up their persecution of the Jews, Menger, who had already fled to the United States, managed to secure him a job in America.
Wald was reluctant to leave Vienna, a city he had fallen in love with (in a letter to a friend, he wrote that it had become “a second home”), but the decision to depart almost certainly saved his life. Eight of his nine family members would later die at the hands of the Nazis. His parents and sisters were killed in the gas chambers of Auschwitz while his beloved older brother, Martin, who had introduced him to mathematics, perished as a slave laborer in western Germany. Wald would remain unaware of these tragedies until the end of the war.