Human Error

Home > Other > Human Error > Page 25
Human Error Page 25

by James Reason


  6.4. The difficulty of detecting high-frequency error forms

  Errors may take such familiar, high-frequency forms that they slip past the mechanism that monitors discrepancies between intention and performance. This ‘disguise-by-familiarity’ factor falls somewhere between Norman and Lewis’s ‘partial explanation’ and ‘overlap’ categories. It is, however, very much in accord with the theoretical arguments of this book. Error detection processes may fail because they too are subject to similarity and frequency biases.

  Consider the following personal example. I had been writing about unsafe act auditing which I referred to throughout the text by the acronym UAA. It cropped up several times in the passage, but in one instance I had typed USA instead of UAA. Although I proofread the section several times, I failed to spot the USA error. It was pointed out to me by someone else. It seems likely that the same frequency bias that caused me to type USA in the first place was also responsible for my failure to detect it subsequently.

  Alice Healy (1976,1980) has carried out a number of studies investigating the effects of familiarity upon detection, both of letter targets and proofreading errors. Her investigation began with the observation by Corcoran (1966) that when people are required to search for and mark the letter e in a printed prose passage, they are most likely to miss it when it occurs within the word the. One explanation offered by Corcoran was that the, being a highly redundant word, is not scanned in reading. Healy, however, had a different idea: the unitization hypothesis, which she expressed as follows (Healy, 1976, p. 235): “The is a word with an extremely high frequency in the language which should make it especially likely to be read as a unit or chunk rather than in terms of its component letters.”

  To test this view, she carried out four experiments in which students read 100-word passages and circled instances of the letter t. It was found that they missed a disproportionate number of t’s in the word the, and this effect could not be explained by either the redundancy of the or by factors involving the location and pronunciation of the t in the. Rather, it was the high frequency of the that appeared to be critical. She concluded that high-frequency words such as the are read in terms of units larger than individual letters. Further support for this idea came from a demonstration that, in a passage of scrambled nouns, letter-detection errors occurred more frequently on common nouns than on rare nouns.

  In a second series of experiments, Healy (1980) looked at the task of proofreading in which subjects were required to detect misspelled words in a passage of text. In the first experiment, the misspellings were introduced by transposing two adjacent letters. In two subsequent studies, involving prose passages and scrambled nouns, the misspellings were created by replacing the letter t with the letter z. The results revealed quite a different pattern of errors. “Whereas subjects made an inordinate number of errors on the in letter detection, the number of errors on the was no greater than chance in proofreading... Likewise, whereas subjects made more errors on common than rare words in letter detection, a small difference in the opposite direction was found in proofreading” (Healy, 1980, pp. 54-55).

  These findings failed to support Corcoran’s redundancy hypothesis. People do not skip over the word the when it is misspelled. Instead the data fit more readily with the unitization hypothesis. In reading normal prose, people process automatically common words, especially the, in units larger than single letters. However, when the formation of common words is distorted by misspellings, the subjects switch to a more detailed letter-by-letter processing and thus detect printing errors.

  It should be noted, however, that these results do not cover the USA-UAA case discussed above. In Healy’s studies, the misspelled words were not lexically possible ones. But USA is not only part of the English lexicon, it is also a very commonly-encountered acronym. In this instance, the error-to-be-detected is highly unitized and, as such, is likely to evade the eye during proofreading for the reasons offered by Healy to explain her letter detection differences. Obviously, a more detailed investigation of this ‘disguise-by-familiarity’ effect is needed before making any confident generalizations. It is worth remembering, though, that Healy herself was more interested in the size of the units used in reading than in the mechanisms of error detection per se.

  7. Summary and conclusions

  Error detection processes form an integral part of the multilevel mechanisms that direct and coordinate human action. Although their precise operation is little understood, there are grounds for believing that their effectiveness relates inversely to their position in the control hierarchy. Unless they are damaged or required to function in ecologically invalid circumstances, low-level postural corrections operate with a very high degree of reliability. At the other extreme, high-level cognitive processes concerned with setting of goals and selecting the means to achieve them are far less sensitive to potential or actual deviations from some optimal path towards a desired state.

  The relative efficiency of these detection mechanisms depends crucially upon the immediacy and the validity of feedback information. At low levels, this is supplied directly and automatically by ‘hard-wired’ neural mechanisms. At the highest levels, however, this information is at worst unavailable and at best open to many interpretations.

  There are basically three ways in which an error may be detected. It may be discovered by one of a variety of self-monitoring processes. These, as noted above, are most effective at the physiological and skill-based levels of performance. It may be signalled by some environmental cue, most obviously a forcing function that prevents further progress. Or it may be discovered by some other person. Detection by others appears to be the only way in which certain diagnostic errors are brought to light in complex and highly stressful situations.

  Although skill-based errors are detected more readily than either rule-based or knowledge-based mistakes, the laboratory data so far obtained do not suggest that there are wide differences in their relative ease of discovery. Such evidence as there is indicates that approximately three out of every four errors are detected by their perpetrators. The chances of making an effective correction, however, appear to be highest at the skill-based level of performance and lowest at the knowledge-based level.

  7 Latent errors and systems disasters

  * * *

  In considering the human contribution to systems disasters, it is important to distinguish two kinds of error: active errors, whose effects are felt almost immediately, and latent errors whose adverse consequences may lie dormant within the system for a long time, only becoming evident when they combine with other factors to breach the system’s defences (see Rasmussen & Pedersen, 1984). In general, active errors are associated with the performance of the ‘front-line’ operators of a complex system: pilots, air traffic controllers, ships’ officers, control room crews and the like. Latent errors, on the other hand, are most likely to be spawned by those whose activities are removed in both time and space from the direct control interface: designers, high-level decision makers, construction workers, managers and maintenance personnel.

  Detailed analyses of recent accidents, most particularly those at Flixborough, Three Mile Island, Heysel Stadium, Bhopal, Chernobyl and Zeebrugge, as well as the Challenger disaster, have made it increasingly apparent that latent errors pose the greatest threat to the safety of a complex system. In the past, reliability analyses and accident investigations have focused primarily upon active operator errors and equipment failures. While operators can, and frequently do, make errors in their attempts to recover from an out-of-tolerance system state, many of the root causes of the emergency were usually present within the system long before these active errors were committed.

  Rather than being the main instigators of an accident, operators tend to be the inheritors of system defects created by poor design, incorrect installation, faulty maintenance and bad management decisions. Their part is usually that of adding the final garnish to a lethal brew whose ingredients have already been l
ong in the cooking.

  There is a growing awareness within the human reliability community that attempts to discover and neutralise these latent failures will have a greater beneficial effect upon system safety than will localised efforts to minimise active errors. To date, much of the work of human factors specialists has been directed at improving the immediate human-system interface (i.e., the control room or cockpit). While this is undeniably an important enterprise, it only addresses a relatively small part of the total safety problem, being aimed primarily at reducing the ‘active failure’ tip of the causal iceberg. One thing that has been profitably learned over the past few years is that, in regard to safety issues, the term ‘human factors’ embraces a far wider range of individuals and activities than those associated with the front-line operation of a system. Indeed, a central theme of this chapter is that the more removed individuals are from these front-line activities (and, incidentally, from direct hazards), the greater is their potential danger to the system.

  Other attempts to minimise errors have been purely reactive in nature, being concerned with eliminating the recurrence of particular active failures identified post hoc by accident investigators. Again, while it is sensible to learn as many remedial lessons as possible from past accidents, it must also be appreciated that such events are usually caused by the unique conjunction of several necessary but singly insufficient factors. Since the same mixture of causes is unlikely to recur, efforts to prevent the repetition of specific active errors will have only a limited impact on the safety of the system as a whole. At worst, they merely find better ways of securing a particular stable door once its occupant has bolted.

  This chapter considers the contribution of latent errors to the catastrophic breakdown of a number of different complex systems. Since the notion of latent error is intimately bound up with the character of contemporary technology, I begin by summarising some of the significant changes that have occurred in the control of high-risk systems over the past few decades. I also consider some of the psychological problems associated with the supervisory control of complex systems.

  1. Technological developments

  Over the past 30 to 40 years, a technological revolution has occurred in the design and control of high-risk systems. This, in turn, has brought about radical (and still little understood) changes in the tasks that their human elements are called upon to perform. Some of the more important factors affecting human performance are outlined below.

  1.1. Systems have become more automated

  One of the most remarkable developments of recent years has been the extent to which operators have become increasingly remote from the processes that they nominally control. Machines of growing complexity have come to intervene between the human and the physical task.

  In the beginning, operators employed direct sensing and manipulation. They saw and touched what they controlled or produced. Then came the intervention of remote sensing and manipulation devices. Either the process was too dangerous or too sensitive to handle directly, or there was a need to extend human muscle power or the operator’s unaided senses were insufficient to detect important physical changes.

  But the most profound changes came with the advent of cheap computing power. Now the operator can be separated from the process by at least two components of the control system. At the lowest level, there are task-interactive systems controlling the various detailed aspects of the operation. And intervening between the specialised task-interactive systems and the operators is the human-system interface, where the control system presents various selected pieces of information to the operators. This interface generally permits only a very prescribed degree of interaction between the human and the now remote process.

  This is the situation termed supervisory control, defined by Sheridan and Hennessy (1984) as “initiating, monitoring, and adjusting processes in systems that are otherwise automatically controlled.” The basic features of human supervisory control are shown in Figure 7.1.

  According to Moray (1986), true supervisory control is achieved through four distinct levels. The lowest two levels comprise the task-interactive system (TIS). This exercises closed-loop control over the hardware components of the task (e.g., propellers, engines, pumps, switches, valves and heaters) through automatic subsystems (e.g., thermostats, autopilots, governors, preprogrammed robots and packaged subroutines). The TIS can trim the system to predetermined set points, but it is incapable of adjusting these set points or of initiating any kind of adaptive response. The TIS is controlled by the human-interactive system (HIS). This comprises the top two levels of the control hierarchy. The HIS is an ‘intelligent’ computer that intercedes between the human operator and the lower-level controllers. This is the distinctive feature of human supervisory control. The HIS communicates the state of the system to the operator through its displays. It also receives commands from the operator regarding new goals and set points. Its intelligence lies in the fact that it can use its stored knowledge to issue tactical commands to the TIS that will optimise various performance criteria.

  Such a control system has brought about a radical transformation of the human-system relationship. As Moray (1986, pp. 404-405)) has pointed out:

  There is a real sense in which the computer rather than the human becomes the central actor. For most of the time the computer will be making the decisions about control, and about what to tell or ask the operator. The latter may either pre-empt control or accept it when asked to do so by the computer. But normally, despite the fact that the human defines the goals for the computer, the latter is in control. The computer is the heart of the system.

  We have thus traced a progression from where the human is the prime mover and the computer the slave to one in which the roles are very largely reversed. For most of the time, the operator’s task is reduced to that of monitoring the system to ensure that it continues to function within normal limits. The advantages of such a system are obvious; the operator’s workload is substantially reduced, and the HIS performs tasks that the human can specify but cannot actually do (see Moray, 1986, for a complete list of the advantages of supervisory control). However, the main reason for the human operator’s continued presence is to use his still unique powers of knowledge-based reasoning to cope with system emergencies. And this, as will be discussed in Section 2, is a task peculiarly ill-suited to the particular strengths and weaknesses of human cognition.

  Figure 7.1. The basic elements of supervisory control (after Moray, 1986).

  1.2. Systems have become more complex and more dangerous

  One of the accompaniments of the increasing computerisation has been that high-risk systems such as nuclear power plants and chemical process installations have become larger and more complex. This means that greater amounts of potentially hazardous materials are concentrated in single sites under the centralised control of fewer operators. Catastrophic breakdowns of these systems pose serious threats not only for those within the plant, but also for the neighbouring public. And, in the case of nuclear power plants and weapons systems, this risk extends far beyond the immediate locality.

  Complexity can be described in relation to a number of features. Perrow (1984) has identified two relatively independent system characteristics that are particularly important: complexity of interaction and tightness of coupling.

  Systems may be more or less linear in their structure. Relatively complex, nonlinear systems possess the following general features (adapted from Perrow, 1984):

  (a) Components that are not linked together in a production sequence are in close proximity.

  (b) Many common-mode connections (i.e., components whose failure can have multiple effects ‘downstream’) are present.

  (c) There is only a limited possibility of isolating failed components.

  (d) Due to the high degree of specialisation, there is little chance of substituting or reassigning personnel. The same lack of interchangeability is also true for supplies and materials.


  (e) There are unfamiliar or unintended feedback loops.

  (f) There are many control parameters that could potentially interact.

  (g) Certain information about the state of the system must be obtained indirectly, or inferred.

  (h) There is only a limited understanding of some processes, particularly those involving transformations.

  In addition, the elements of a system may be coupled either tightly or loosely. The characteristics of a tightly-coupled system are listed below (adapted from Perrow, 1984):

  (a) Processing delays are unacceptable.

  (b) Production sequences are relatively invariant.

  (c) There are few ways of achieving a particular goal.

  (d) Little slack is permissible in supplies, equipment and personnel.

  (e) Buffers and redundancies are deliberately designed into the system.

  It should be stressed that interactiveness and tightness of coupling are tendencies, not hard-and-fast properties. No one system is likely to possess all the characteristics of complexity outlined above. Nuclear power plants, nuclear weapons systems, chemical process plants and large passenger aircraft are examples of systems that possess both a high degree of interaction and tightness of coupling. Dams, power grids, rail and marine transport have tight coupling but linear interactions. Mining operations, research and development companies, universities, and multigoal public agencies (such as the Department of Health and Social Security in Britain) have loose coupling and complex interactions. Trade schools, assembly-line production and most manufacturing plants have loose coupling and linear interactions.

 

‹ Prev