To be fail-safe implies that a machine, instrument, or procedure has been designed in such a way that if it should fail, it will fail in a way that will not result in harm to property or personnel. You’ll recall that Louis Slotin defied fail-safe principles when he unwisely performed his criticality experiments by lowering components onto a nuclear core (see chapter 5). Fail-safe rules required that he should have always raised the components up toward the core. The fail-safe logic of this safety rule was that, if something should slip, gravity would pull it away, rather than toward a critical mass situation. But Slotin, as we already know, ignored this fail-safe rule and lowered a neutron-reflecting brick onto a core. When the brick did slip and fall onto the core, there was a criticality burst that resulted in Slotin receiving a lethal dose of radiation, instead of just a broken toe.
The flaw in fail-safe logic that killed the cooling condenser system had to do with the circuit that controlled the condenser’s valves. The circuit was erroneously programmed to interpret a loss of power as an indication of a leak somewhere in the cooling pipes.10 Thus, the circuit commanded all the valves of the condenser system to close down in order to stem the alleged leak. In this case, however, there was no leak, and closing the valves just meant stopping the coolant flow. Now there was no means to keep the reactor from overheating. In most reactor accidents, there always seems to be some type of perilous mistake that causes things to spiral out of control.11 At Reactor Unit One, this was it.
The other Reactor Units at the site eventually experienced their own electrical failures, each ultimately resulting in loss of their reactor cooling.12 All the reactor operators could do was sit and watch the reactor dials by flashlight, without the ability to intervene. Their hands were tied by lack of power. Meanwhile, two operators in Reactor Unit Four were already dead. They had not been as fortunate as Pillitteri’s crew. They became trapped in their turbine building when the water rose in the basement and drowned.13
At 6:45 a.m. on March 12, the morning after the earthquake, TEPCO officials announced that radioactivity may have leaked off the plant grounds. A curious announcement given the fact that it could easily have been confirmed. People didn’t know what to make of it. Was this TEPCO’s oblique way of saying that the reactor cores had melted down? Even if a reactor meltdown hadn’t yet claimed any lives, there apparently was at least one casualty already; that was timely and reliable information from TEPCO.
Meltdown means that the reactor fuel gets so hot that it turns into molten metal, similar to hot lava. This molten radioactive fuel can burn holes in its container, releasing the reactor’s radioactive fission products into the environment. If a meltdown has been caused by uncontrolled criticality, the melting process may actually help to stop the accident because the melted fuel tends to spread out into a geometric configuration that is not conducive to sustaining criticality.
But the Fukushima reactors were not experiencing uncontrolled criticality, because they were successfully shut down. The cores were at risk because their cooling systems failed. In the end, all three of the operating reactors would completely melt down.
TOWER OF BABEL
Anyone who’s ever had a flooded basement in their home might ask: Why put essential electrical systems in a basement where the possibility of flooding always exists? The answer is because a very high seawall has eliminated the flooding threat. Really?
The seawall that TEPCO built to protect Fukushima’s harbor was 18.7 feet (5.7 meters) high. Why this height and why not lower or higher? It turns out that the seawall height was based on the Chilean 1960 earthquake, one of the more recent of the large quakes that have sent tsunamis to Japan. This was a massive 9.5 quake, even larger than the 2011 quake, but its epicenter was on the other side of the Pacific Ocean. The tidal wave it produced arrived hours later and it took 142 Japanese lives. This 1960 disaster was fresh on the minds of those who built the Fukushima sea wall in 1967. However, the worst Japanese tsunami in recorded history was actually produced by the Meiji Sanriku earthquake of 1896, well before most people had been born. Although it registered a magnitude of only 7.2, its waves reached heights of 100 feet (30 meters) and killed 22,000 people. But it would have been cost prohibitive to build 100-foot-high seawalls, so the Chilean 1960 tsunami became the standard for seawall height along the east coast of Japan.14 The height of 18.7 feet (5.7 meters) would have been good enough for the 1960 tsunami, but totally inadequate for 1896, or for 2011.15
IMPERIAL WISDOM
As faith in TEPCO’s ability to contain the situation waned, fear and anxiety mounted back in Tokyo, fed by a spreading rumor that the emperor had fled his Tokyo palace and gone to Kyoto, a city about 230 miles (370 kilometers) further away from Fukushima.16 Had their emperor abandoned them? His father, Hirohito, had remained in Tokyo even during the Allied Forces’ firebombing of the city. Was the Fukushima accident a bigger threat to Tokyo than that? The rumor turned out to be false, but it had a chilling effect on people nonetheless. Something needed to be done to restore calm.
After suffering five days of bad news, an increasingly distrustful and hostile populace needed a reassuring voice, and they weren’t hearing one from TEPCO. So, on March 16, the grandfatherly emperor, Akihito, addressed his nation in a televised broadcast urging his people to stay calm:
I am deeply hurt by the grievous situation in the affected areas. The number of deceased and missing increases by the day and we cannot know how many victims there will be. My hope is that as many people as possible are found safe. I hope from the bottom of my heart that the people will, hand in hand, treat each other with compassion and overcome these difficult times [and not] abandon hope.17
SKETCHY NUMBERS
That earthquakes and tsunamis pose a threat to nuclear power plants is well known, but the magnitude of the risk is hard to assess (i.e., earthquake prediction). At any rate, earthquakes and their resulting tsunamis are known unknowns. We know they are a problem and we have at least one tenable solution (i.e., seawalls), but we’re not exactly sure how big the problem is (i.e., height of the wave), so we don’t know how big our solution needs to be (i.e., height of the seawall).
Norman C. Rasmussen (1928–2003), a professor of nuclear engineering at MIT, is considered the father of assessing nuclear power reactor safety using statistical models. In the early 1970s, he headed a federal committee charged with determining the risk of a nuclear reactor core accident occurring. The committee’s approach was to identify the series of failure events that would all need to occur for a core accident to happen (termed fault trees), estimate the probabilities of individual failures for each branching point on the tree, and then multiply those probabilities by one another to get the overall probability that multiple system failures would simultaneous occur in a way that could result in a core accident. In 1975, the committee issued its report, which was officially known as WASH-1400, but unofficially called the Rasmussen Report.
The report received worldwide attention, as did Professor Rasmussen, because its conclusion was that the risks posed by nuclear power plants were extremely small. The report contended that the odds of an accident involving damage to the reactor core at a commercial nuclear power plant was just 1 chance in 20,000 operation years (i.e., odds of 1:20,000).
An operation year is analogous to the working month that we previously used for assessing the lung cancer risk of mine workers. By a working month, we meant one miner working a mine for one month. In this case, an operation year means one year of power generation for a single reactor. We’re using it here as a measure of the total time that a reactor is actually generating power, which is the period when a core accident is most likely to occur.
But remarkably, in March 1979, just four years after the Rasmussen Report was published and when the American nuclear industry had less than 500 operational years of experience under its belt, the Three Mile Island reactor accident occurred on the Susquehanna River in Pennsylvania. In that case, a valve failed to shut properly and vented re
actor coolant to the environment. The situation initially went unnoticed by reactor operators due to instrument failures, and then confusion about the situation led them to override an automated cooling system because they mistakenly thought that the coolant level in the reactor was too high when it was actually too low. This unfortunate sequence of events cascaded into a partial meltdown.18 Luckily, most all of the radioactivity was contained on site so there were no health consequences, but it was a very close call and justifiably got people’s attention. Was this just extremely bad luck, or was the risk of a core accident actually closer to 1:500 than 1:20,000?
The US Nuclear Regulatory Commission decided to revisit the risk issue with another committee that was charged with auditing the Rasmussen committee’s report. The second committee, headed by Harold W. Lewis (1923–2011), a professor of physics at the University of California at Santa Barbara, endorsed Rasmussen’s probabilistic approach to assessing risk, particularly with regard to the use of fault trees to track how problems might spread through a plant. But it also found flaws in the Rasmussen model, specifically in that Rasmussen had neglected to consider some forms of risk, most notably fires, and only considered a limited number of reactor designs to the exclusion of others. In fact, Three Mile Island’s reactor design had been excluded. Additionally, the report was severely criticized in that it failed to adequately acknowledge the large uncertainty associated with its risk estimates. Nevertheless, this second committee did exonerate the Rasmussen committee on one of its omissions. Noting that Rasmussen’s committee had not considered the possible risks of sabotage, the Lewis committee noted, “The omission was deliberate, and proper, because it recognized that the probability of sabotage of a nuclear power plant cannot be estimated with any degree of certainty.” At least there was something both committees agreed on.
Rasmussen had not, however, neglected to consider tsunamis. His committee recognized that tsunamis and hurricanes posed risks, but thought those risks were vanishingly small. In the words of the report:
Some plants are located on the seashore where the possibility of tidal waves [tsunamis] and high water levels due to hurricanes exists. The plant design in these cases must accommodate the largest waves and water levels that can be expected. Such events were assessed to represent negligible risks.19
The Lewis committee concluded that the Rasmussen committee failed to consider a number of factors that could increase risk, and suggested that the true risk might be much higher than the Rasmussen Report had depicted. But this second committee fell short of offering its own risk estimate.20
It has been 36 years since the Lewis committee issued their report. We now have more than 15,500 operational years of commercial nuclear reactor experience worldwide (as of 2014).21 According to the International Atomic Energy Agency (IAEA), which tracks and rates nuclear incidents on a severity scale from 1 (low) to 7 (high), there have been a total of ten nuclear power plant accidents of varying severity—levels 4 (accident with local consequences) through 7 (major accident)—during that period (1952 to 2014). Ten accidents out of a total of 15,500 operational years translates into a rate of one accident per 1,550 operational years, or the odds of 1:1,550.
Currently (2014), there are 430 operational nuclear reactors in the world.22 A risk level of 1:1,550 would, therefore, suggest that we might expect a significant reactor core accident to occur among one of these 430 reactors once every three to four years (3 years × 430 operational reactors = 1,290 operational years; 4 years × 430 operational reactors = 1,720 operational years). But the 1:1,550 odds of an accident was our risk rate of the past. A moving target counterargument would posit that we’ve learned our safety lessons and that more modern reactors have much safer designs, making the accident rate of the past an overestimation of future risk. Although that may be true, it is also true that many existing reactors are very old and therefore are more likely to have maintenance-related failures. So it’s very hard to tell what the overall effect on the historic reactor accident rate will be. If our future resembles our past experience, then 1:1,550 would remain our best risk estimate going forward.
Whether it’s 1:20,000 (Rasmussen), 1:1,550 (historic), or something else, the risk is definitely not zero, and the list of assumptions and uncertainties that go into these risk calculations is long. As a rule, when uncertainties are high, using our simple NNH calculation to judge our personal risk doesn’t really hold water. For example, we could take the 1:1,550 odds and convert them to NNH = 1,550 operational years, and interpret it to mean that we would need to live 1,550 years next to a single-core local nuclear power plant to expect to experience one core-related accident. Since none of us will live anywhere near that long, we might judge our personal risk to be slight. This would be true … if our neighborhood nuclear power plant is a typical nuclear power plant … if the plant is following all safety regulations … if there are no undetected maintenance issues … if there are no engineering design flaws … if there are no operator errors … if there are no severe earthquakes … and the list goes on. The point is not whether the risk estimate is right or wrong, or whether the NNH logic is flawed when it comes to nuclear power plants. Rather, the point is that this apparently precise risk estimate comes with an enormous grain of salt; that is, it comes with a high level of uncertainty. Some people have high salt tolerance, while others prefer to keep their food bland. Everyone needs to make her own decision about the amount of salt she can handle in her diet.
Another issue is that the situation with nuclear power plants may be more complex than simply calculating statistical probabilities. Some have argued that complicated systems, such as nuclear power plants, have an inherent characteristic called interactive complexity, and that failures can occur due to interactions between different components of the highly integrated systems. Furthermore, these interactions are nearly impossible to foresee.23 Interactive complexity presents new types of risks that never occurred to the designers of the systems (i.e., an unknown unknown). This allows catastrophes to happen when two or more systems fail at the same time in unexpected ways. Furthermore, if the complex system is also tightly coupled, such that one event quickly proceeds to the next, there is little time for operators to intervene and fix the problem. More worrisome still is that because there is no time for operators to learn to understand the nature of the novel interactive problem, they may misread it and intervene in the wrong way, making the situation worse than if things had been allowed to run their course.
It may seem illogical, but piling backup safety systems on top of backup safety systems can actually exacerbate problems because it increases interactive complexity and introduces more unknown unknowns. Viewed in this way, we are fighting ourselves when we add too many safety redundancies into a system, no matter how well intentioned, because we are simultaneously adding interaction nodes that are new targets for failure.
The implication is that, if we try to remedy risky situations with complex solutions, we can actually invite more risk, potentially increasing the level of risk to the point that an accident becomes inevitable. In such a scenario, accidents become the norm rather the exception and these types of incidents have, therefore, been called normal accidents.24 Whether the Fukushima Daiichi power plant accident should be classified as a normal accident or just really bad luck is a matter for debate. But certainly, having six different nuclear reactors in one location at common risk from external forces, and having them share electrical systems, steam vents, cooling systems, and so forth, results in a highly complex operating environment that’s primed for a domino effect of failures that could lead to a catastrophe.25 The Fukushima Daiichi plant was one of the oldest, largest, and most complex reactor plants in Japan. It is often overlooked that there were 17 other newer, smaller, and simpler nuclear power plants that were also affected by the 2011 earthquake, and four of them were even hit with the same tsunami, yet only the Fukushima Daiichi plant failed and it failed in a big way (i.e., three meltdowns).26 That says something
for the value of simplicity.
There is no way to eliminate unknown unknowns, but studies have shown that by simplifying systems you can greatly reduce sources of systems risk, and lessen the likelihood of interaction-driven failures that are nearly impossible to anticipate. Furthermore, if we loosen the “tightness” of internal couplings within a complex system, we can further lower the level of this type of risk. The modifications need not just be technical. Organizational simplification can also have a positive impact, as has already been demonstrated in other highly complex systems, such as air traffic control.27 Thus, managers of complex technological systems can learn much from studying the safety improvements within other complex systems. We may not be able to precisely measure the risk of a nuclear core accident, but there are definitely ways to minimize it.
The Lewis committee was convened because the occurrence of the Three Mile Island nuclear accident suggested that risks of a nuclear core accident were actually significantly higher than predicted by Rasmussen’s statistical models. In that sense, the Three Mile Island accident was fortuitous because it caused us to revisit Rasmussen’s risk models with a more critical eye. In the process, we learned a great deal about the weaknesses of predicting catastrophic accidents with statistical models, particularly with regard to uncertainty. Fortunately, not one life was lost at Three Mile Island. Had the Three Mile Island accident not occurred, we most likely would not have revisited Rasmussen’s work and would have remained completely naive with regard to reactor safety until eventually some other, possibly devastating, nuclear reactor accident happened.
Strange Glow Page 38