Aloft
Page 17
These high-reliability theorists object to being called ‘optimistic,’ which sounds to them like ‘naive.’ They say they accept the inevitability of human error and mechanical failure, and they deny that they use a ‘closed’ organizational model – one that simplistically assumes that civilian organizations might, like military units, be isolated from the confusions of the larger society. Nonetheless, they work with the idea that organizations can be made superior to the sum of their parts, that redundancies count, that decision making and formal responsibilities can be centralized or decentralized according to need, that organizations are rational beings which learn from past mistakes and can tailor themselves to achieve new objectives, and that if the right steps are taken, accidents can be avoided. A zero-accident rate, they say, is a theoretical possibility.
Perrow studied at Berkeley and once worked with some of the high-reliability theorists, but his thinking grew up beside theirs, not in reaction to it. It has close but unacknowledged ties to the idea of chaos in the natural world – the disorder discovered by Edward Lorenz that frustrates forecasts and limits practical science. More explicitly, Perrow’s accident theory grows from a skeptical view of large organizations as overly rigid, internally divided, and inherently unfocused systems – collectives that resist learning, gloss over failures, suffer from internal conflicts and confusion, and defy rational plans.
This approach was refined in the 1970s by sociologist James March, who wrote about ‘organized anarchies,’ and coined the term ‘garbage can’ to characterize their internal functioning – a bewildering mix of solutions looking for problems, inconsistent and ill-defined goals, fluid and uninformed participation in decision making, changes in the outside world, and pure chance as well. Of course, organizations do succeed in producing products, including services like safe airline flying. The garbage can model explains the reasons only for their difficulties, but it does so with a ring of truth among executives long frustrated by their lack of direct control.
Perrow uses the garbage can model to explain why institutional failures are unavoidable, even ‘normal,’ and why when organizations are required to handle dangerous technologies safely, they regularly do not. By necessity these are the very organizations that claim, often sincerely, to put safety first. Their routine failures sometimes become Perrow’s ‘normal accidents’ and may blossom as they did for the FAA and Valujet into true catastrophes.
Perrow’s seminal book, Normal Accidents: Living with High Risk Technologies (1984), is a hodgepodge of story-telling and exhortation, weakened by contradiction and factual error, out of which however this new way of thinking has risen. His central device is an organizational scale against which to measure the likelihood of serious ‘system’ accidents. He does not assign a numerical index to the scale but uses a set of general risk indicators. On the low end stand the processes – like those of most manufacturing – that are simple, slow, linear, and visible, and in which the operators experience failures as isolated and containable events. At the other end stand the opaque and tangled processes characterized by a combination of what Perrow calls ‘interactive complexity’ and ‘close coupling.’
By ‘interactive complexity’ he means not simply that there are many elements involved but that those elements are linked to one another in multiple and often unpredictable ways. The failure of one part – whether material, psychological, or organizational – may coincide with the failure of an entirely different part, and this unforeseeable combination will cause the failure of other parts, and so on. If the system is large, the combinations are practically infinite. Such unravelings seem to have an intelligence of their own; they expose hidden connections, neutralize redundancies, bypass ‘firewalls,’ and exploit chance circumstances which no engineer could have anticipated. When the operating system is also inherently quick (like a chemical process, an automated response to missile attack, or a jet airliner in flight), the cascading failures will accelerate out of control, confounding the human operators and denying them a chance to jerry-rig a recovery. That lack of slack is Perrow’s ‘close coupling.’ Then the only difference between an accident and a human tragedy may be a question, as in chemical plants, of which way the wind blows.
I ran across this thinking by chance, a year before the Valujet crash, when I picked up a copy of Scott D. Sagan’s book, The Limits of Safety: Organizations, Accidents, and Nuclear Weapons (1993). Sagan is a Stanford political scientist, as fastidious and contained a man as Perrow is not. He is a generation younger, the sort of deliberate careerist who moves carefully between posts in academia and the Pentagon. Unlike Perrow, he seems drawn to safety for personal as well as public reasons. Perrow needed such an ally. Sagan is the most persuasive of his interpreters, and with The Limits of Safety he has solidified system accident thinking, focusing it more clearly than Perrow was able to. The book starts by opposing high-reliability and normal-accident theories, then tests them against a laboriously researched and previously secret history of failures within U.S. nuclear weapons operations. The test is a transparent artifice, but it serves to define the opposing theories. Sagan’s obvious bias does not diminish his work.
Strategic weapons pose an especially difficult problem for system-accident thinking for two reasons: First, there has never been an accidental nuclear detonation, let alone an accidental nuclear war; and second, if a real possibility of such an apocalyptic failure exists, it threatens the very logic of nuclear deterrence – the expectation of rational behavior on which we continue to base our arsenals. Once again the pursuit of system accidents leads to uncomfortable ends. Sagan is not a man to advocate disarmament, and he shies away from it here, observing realistically that nuclear weapons are here to stay. Nonetheless, once he has defined ‘accidents’ as less than nuclear explosions – as false warnings, near launches, and other unanticipated breakdowns in this ultimate ‘high-reliability’ system – Sagan discovers a pattern of such accidents, some of which were contained only by chance. The reader is hardly surprised when Sagan concludes that the accidents were inevitable.
The book interested me not because of the catastrophic potential of such accidents but because of the quirkiness of the circumstances that underlay so many of them. It was a quirkiness which seemed uncomfortably familiar to me. Though it represented possibilities that I as a pilot had categorically rejected, this new perspective required me to face the wild side of my own experience with the sky. I had to admit that some of my friends had died in crazy and unlucky ways, that some flights had gone uncontrollably wrong, and that perhaps not even the pilots were to blame. What is more, I had to admit that no matter how carefully I checked my own airplanes and how deliberately now I flew them, the same could happen to me.
That is where we stand now as a society with Valujet, and it explains our continuing discomfort with the accident. Flight 592 burned because of its cargo of oxygen generators, yes, but more fundamentally because of a tangle of confusions which the next time will take some entirely different form. It is frustrating to fight such a phenomenon. At each succeeding level of inquiry we seize upon the evidence of wrongdoing, only to find after reflection that our outrage has slipped away. Flight’s greatest gift is to let us look around, to explore the inner world of sky, but also in the end to bring us back down again, and leave us facing ourselves.
Take, for example, the case against the two Sabretech mechanics who removed the oxygen canisters from the Valujet MD-80s, ignored the written work orders to install the safety caps, stacked the dangerous canisters improperly in the cardboard boxes, and finished by fraudulently signing off on a job well done. They will probably suffer much of their lives for their negligence, as perhaps they should. But here is what really happened. Nearly 600 people logged work time against the three Valujet airplanes in Sabretech’s Miami hangar; of them, 72 workers logged 910 hours across several weeks against the job of replacing the ‘expired’ oxygen generators – those at the end of their approved lives. According to the supplied Valuje
t work card 0069, the second step of the seven-step removal process was: If generator has not been expended, install shipping cap on the firing pin.
This required a gang of hard-pressed mechanics to draw a distinction between canisters that were ‘expired,’ meaning the ones they were removing, and canisters that were not ‘expended,’ meaning the same ones, loaded and ready to fire, on which they were now expected to put nonexistent caps. Also involved were canisters which were expired and expended, and others which were not expired but were expended. And then, of course, there was the simpler thing – a set of new replacement canisters, which were both unexpended and unexpired.
If this language seems confusing, do not waste your time trying to sort it out. The Sabretech mechanics certainly did not, nor should they have been expected to. The NTSB later suggested that one problem at Sabretech’s Miami facility was the large number of Spanish-speaking immigrants on the work force, but quite obviously the language problem lay on the other side – with Valujet and the narrowly educated English-speaking engineers who wrote work orders and technical manuals as if they were writing to themselves.
Eleven days after the accident, one of the hapless mechanics who had signed off on the work still seemed unclear about basic distinctions between the canisters. An NTSB agent asked him about a batch of old oxygen generators, removed from the MD-80s, that the mechanic had placed in a box.
AGENT: Okay. Where were they?
MECHANIC: On the table.
AGENT: On the table?
MECHANIC: Yes.
AGENT: And there were only how many left to do? (He meant old oxygen generators to be replaced, remaining in the airplane.)
MECHANIC: How many left?
AGENT: Yeah. You said you did how many?
MECHANIC: Was like eight or twelve, something like that.
AGENT: Eight or twelve left?
MECHANIC: The rest were already back in the airplane.
AGENT: The new ones?
MECHANIC: Yes.
AGENT: What about the old ones?
MECHANIC: The old ones?
AGENT: Yeah. Yeah, that’s the one we’re worried about, the old ones.
MECHANIC: You’re worried about the old ones?
AGENT: Yeah.
But that was after the accident. Before the accident, the worry was not about old parts but about new ones – the safe refurbishing of the MD-80s in time to meet the Valujet deadline. The mechanics quickly removed the oxygen canisters from the brackets and wired green tags to most of them. The green tags meant ‘repairable,’ which these canisters were not. It is not clear how many of the seventy-two workers were aware that the canisters could not be used again, since the replacement of oxygen generators is a rare operation, though most claimed after the accident to have known at least why the canisters had to be removed. But here, too, there is evidence of confusion. After the accident, two tagged canisters were found still lying in the Sabretech hangar. On one of the tags under ‘Reason for removal’ someone had written, ‘Out of date.’ On the other tag someone had written, ‘Generators have been fired.’
Yes, a perfect mechanic might have found his way past the Valujet work card, and into the massive MD-80 Maintenance Manual, to chapter 35-22-01, within which line ‘h’ would have instructed him to ‘store or dispose of oxygen generator.’ By diligently pursuing these two options, he could eventually have found his way to a different part of the manual and learned that ‘all serviceable and unserviceable (un expended) oxygen generators (canisters) are to be stored in an area that ensures each unit is not exposed to high temperatures or possible damage.’ By pondering the structure of that sentence he might have deduced that ‘unexpended’ canisters are also ‘unserviceable’ canisters, and therefore perhaps should be taken to a safe area and ‘initiated’ according to the procedures provided in section 2.D.
To ‘initiate’ an oxygen generator is, of course, to fire it off, triggering the chemical reaction that produces oxygen and leaves a mildly toxic residue within the canister, which then is classified as a hazardous waste. Section 2.D ends with the admonition that ‘an expended oxygen generator (canister) contains both barium oxide and asbestos fibers and must be disposed of in accordance with local regulatory compliances and using authorized procedures.’ No wonder the mechanics stuck the old generators in boxes.
The supervisors and inspectors failed miserably here, though after the accident they proved clever at ducking responsibility. At the least they should have supplied the required safety caps and verified that those caps were being used. If they had – despite all the other errors that were made – Flight 592 would not have burned. For larger reasons, too, their failure is an essential part of this story. It represents not the avarice of profit takers but rather something more insidious, the sort of collective relaxation of technical standards that Boston College sociologist Diane Vaughan has called ‘the normalization of deviance’ and that she believes existed at NASA in the years leading to the 1986 explosion of the space shuttle Challenger. The leaking o-ring that caused the catastrophic blow-through of rocket fuel was a well-known design weakness, and it had been the subject of worried memos and conferences up to the eve of the launch. Afterward it was widely claimed that the decision to launch anyway had been made because of political pressure from the top – the agency was drifting, its budget was threatened, and the leadership from the White House down wanted to avoid the embarrassment of an expensive delay. But after an exhaustive exploration of NASA’s closed and technical world, Vaughan concluded that the real problems were more cultural than political and that the error had actually come from below. Simply put, NASA had proceeded with the launch despite its o-ring worries largely because it had gotten away with launching the o-ring before. What can go wrong usually goes right – and people just naturally draw the wrong conclusions. In a general way, this is what happened at Sabretech. Some mechanics now claim to have expressed their concerns about the safety caps, but if they did they were not heard. The operation had grown used to taking shortcuts.
But let’s be honest. Mechanics who are too careful will never get the job done. Whether in flight or on the ground, the airline system requires the people involved to compromise, to improvise, and sometimes even to gamble. The Sabretech crews went astray – but not far astray – by allowing themselves quite naturally not to worry about discarded parts.
A fire hazard? Sure. The mechanics tied off the lanyards and shoved the canisters a little farther away from the airplanes they were working on. The canisters had warnings about heat on them, but none of the standard hazardous material placards. It probably would not have mattered anyway because the work area was crowded with placards and officially designated hazardous materials, and people had learned not to take them too seriously. Out of curiosity, a few of the mechanics fired off some canisters and listened to the oxygen come out – it went pssst. Oh yeah, the things got hot, too. No one even considered the possibility that the canisters might accidentally be shipped. The mechanics did finally carry the five cardboard boxes over to the shipping department, but only because that was where Valujet property was stored – an arrangement that itself made sense.
The shipping clerk was a regular fellow. When he got to work the next morning, he found the boxes without explanation on the floor of the Valujet area. The boxes were innocent-looking, and he left them alone until he was told to tidy up. Sending them to Atlanta seemed like the best way to do that. He had shipped off ‘company material’ before without Valujet’s specific approval, and he had heard no complaints. He knew he was dealing now with oxygen canisters but apparently did not understand the difference between oxygen storage tanks and chemical generators designed to fire off. When he prepared the boxes for shipping, he noticed the green ‘repairable’ tags mistakenly placed on the canisters by the mechanics, and he misunderstood them to signify ‘unserviceable’ or ‘out of service,’ as he variably said after the accident. He also drew the unpredictable conclusion that the canisters were
therefore empty. He asked the receiving clerk to fill out a shipping ticket.
The receiving clerk did as he was instructed, listing the tires and canisters, but he put quotation marks around the word ‘empty.’ Later, when asked why, he replied, ‘No reason. I always put like, when I put my check, I put “Carlos” in quotations. No reason I put that.’
The reason was, it was his habit. On the shipping ticket he also wrote ‘5 boxes’ between quotation marks – a nonsensical use of punctuation which in context now can be taken to mean not that Carlos suspected there were fewer boxes or more but by implication that he too believed the oxygen canisters were empty.
Two days later, over by Flight 592 the Valujet ramp agent who signed for the cargo did not care about such subtleties anyway. Valujet was not authorized to carry hazardous cargoes of any sort, and it seems obvious now that a shipping ticket listing inflated tires and oxygen canisters (whether ‘empty’ or not) should have aroused the ramp agent’s suspicions. No one would have complained had he opened the boxes or summarily rejected the load. There was no ‘hazardous material’ paperwork associated with it, but he had been formally trained in the recognition of unmarked hazards. His Valujet Station Operations Manual specifically warned that ‘cargo may be declared under a general description that may have hazards which are not apparent, that the shipper may not be aware of this. You must be conscious of the fact that these items have caused serious incidents, and in fact, endangered the safety of the aircraft and personnel involved.’ It also warned: