by James Reason
Leaving aside the issue of whether truly intelligent decision support systems are feasible, or even desirable, we must not allow the lure of ‘high-tech’ prostheses to blind us to two important facts. First, most of the residual human failures that now threaten the safety of high-risk technologies are not amenable to technological ‘fixes’; this conclusion applies most especially to latent managerial and organisational deficiencies. Second, there are a number of pressing concerns that can be remedied by simple, well-understood and available measures. An obvious candidate in nuclear power plant operations is the omission of necessary maintenance actions due to memory failure.
5.3. Memory aids for maintenance personnel
As indicated in Chapter 7, the clear conclusion from a number of recent nuclear power plant error surveys (Rasmussen, 1980; INPO, 1984, 1985b) is that maintenance-related omissions constitute a substantial proportion of the human failure root causes in significant event reports. These involved such things as forgetting to set valves in the appropriate position, not removing tools and other objects, and leaving out necessary steps in either preventive or corrective maintenance schedules. This corresponds with the findings obtained from error proneness questionnaires (Reason & Mycielska, 1982; Reason, 1984a) where omitting to carry out planned actions (i.e., failures of prospective memory) comprise one of the most common forms of everyday lapse.
In addition to the general factors that promote absent-minded slips and lapses (the execution of routine tasks while preoccupied or distracted), there are a number of task factors that are likely to increase the probability of making an omission error:
(a) The larger the number of discrete steps in an action sequence, the greater the probability that one or more of them will be omitted.
(b) The greater the informational loading of a particular procedural step, the more likely it is that items within that step will be omitted.
(c) Procedural steps that are not obviously cued by preceding actions or that do not follow in a direct linear sequence from them are likely to be omitted.
(d) When instructions are given verbally and there are more than five simple steps, items in the middle of the list of instructions are more likely to be omitted than either those at the beginning or the end.
(e) When instructions are given in a written form, isolated steps at the end of sequence (e.g., replacing caps or bushes, removing tools, etc.) have a reasonably high probability of being omitted.
(f) Necessary steps in an action sequence are more likely to be omitted during reassembly than during the original disassembly (see Chapter 6).
(g) In a well-practised, highly automatic task, unexpected interruptions are frequently associated with omission errors, either because some unrelated action is unconsciously ‘counted in’ as part of the task sequence, or because the interruption causes the individual to ‘lose his place’ on resumption of the task (i.e., he believes that he was further along in the task prior to the interruption than he actually was). Such routinised tasks are also especially prone to premature exits – moving on to the next activity before the previous one is completed, thus omitting some necessary final steps. This is particularly likely to happen when the individual is working under time pressure or when the next job is near at hand.
These observations have a number of practical applications. In the short term, they make it possible to identify in advance those steps in written maintenance procedures that are most likely to be omitted. Consider, for example, the following job description covering valve inspection in preventive maintenance on a compressor (Kelly, 1984): “Check and clean suction and pressure valves. Replace defective valves. Replace packings. Clean valve chambers.” The step most likely to be omitted here is the replacement of the packings. Having identified the most probable omissions, it is possible to provide maintenance personnel with a set of procedures, stored in a cheap lap-held computer, that not only give the user step-by-step guidance in what has to be done, but also prompt him to check that easily omitted steps have been completed.
A prototype of such a device, termed a portable interactive maintenance auxiliary (PIMA), has been developed in our laboratory. It is designed to be implemented on an Epson PX-8 lap-held computer or some suitable equivalent and is intended to form part of a maintenance technician’s basic equipment in installations such as nuclear power plants where both the informational load (i.e., the amount of technical information necessary to perform a particular maintenance task) and the cost of maintenance-related errors is high. Although PIMA was designed as an external memory aid, its implementation on a powerful but truly portable computer will also allow it to serve surveillance and checking functions in plants possessing computerised maintenance documentation systems (see Kelly, 1984). Moreover, it closes the electronic gap that presently exists within even the most comprehensive of computerised maintenance systems. It provides a fully computerised loop in both the outward (work sheets, technical job information and memory aids) and inward (technical feedback, timesheets, schedule and task checking) directions.
Before concluding this brief discussion of cognitive aids (or prostheses), it would be helpful to remind ourselves of their compensatory functions. The preparation and execution of an action sequence may be divided into three overlapping stages: (a) plan formulation, (b) plan storage and (c) plan execution. Progression through these stages from (a) to (c) is associated with a gradual shift from higher to lower level cognitive processors, from a predominantly attentional (conscious) to a predominantly schematic (automatic) mode of control. Decision aids are designed to minimise failures at the plan formulation stage, while memory aids support performance at the storage and execution stages. The relationship between these cognitive prostheses and their compensatory functions are summarised in Table 8.1.
5.4. Training issues
5.4.1. Procedures or heuristics?
One of the most sustained and coherent programmes of research into training methods in advanced continuous-process industries has been carried out by Keith Duncan and his coworkers, then at the University of Wales Institute of Science and Technology (Duncan & Shepherd, 1975; Duncan & Gray, 1975a, 1975; Shepherd, Marshall, Turner & Duncan, 1977; Duncan, 1987). Their primary concern has been with the training of fault diagnosis. The early studies sought to establish how experienced operators went about diagnosing faults from control panel arrays. The evidence indicated that for the most part they were applying short sequences of heuristics, or diagnostic ‘rules of thumb’. Verbal expressions of a few of these often intuitive heuristics are listed below (taken from Duncan, 1987, p. 265).
(a) Scan the panel to locate the general area of failure.
(b) Check all control loops in the affected area. Are there any anomalous valve positions?
(c) A high level in a vessel and a low flow in associated take-off line indicates either a pump failure or a valve that failed in the closed position. If valves OK (see b), then pump failure probable.
Table 8.1. A summary of the compensatory functions of memory and decision aids.
AID TYPE
FUNCTIONS
DECISION AIDS
To compensate for bounded rationality: the fact that attention can only be directed at a very small part of the total problem space at any one time.
To direct attention to logically important aspects of the problem space.
To correct the tendency to apply familiar but inappropriate solutions.
To minimise the influence of availability bias, the tendency to prefer diagnoses and/or strategies that spring readily to mind.
To rectify incomplete or incorrect knowledge.
SHARED FUNCTIONS OF DECISION AND MEMORY AIDS
To augment the limited capacity of working memory. This serves two primary functions:
(a)as a working database wherein analytical operations can be performed; and
(b)as a means of keeping track of progress by relating current data to stored plans in long-term memory.
MEMORY AIDSr />
To augment prospective memory. That is, to provide an interactive checklist facility to enable the appropriate actions to be performed in the desired sequence at the right time. In short, to prompt the what? and the when? of planned actions. Also to encourage checking that all the necessary actions have been completed before moving on to the next stage.
(d) High temperature and pressure in column head associated with low level in reflux drum indicate overhead condenser failure—provided all pumps and valves are working correctly (see b and c).
(e) If the failure is in the reactor/heat-exchange complex, determine whether it is in the reactor or the heat-exchange system. A failure in the heat-exchange will produce symptoms in Column A but not in B. A failure in the reactor will produce symptoms in both columns.
(f) If the failure is in the feed system, check whether it is in stream X or stream Y. Because of the nature of the control system, a failure in the Y stream will produce associated symptoms in both the X and Y streams. A failure in the X stream will show symptoms in the X stream only.
This group has conducted a number of studies in which novices were trained in the laboratory to diagnose faults from a display panel relating to a fictitious petrochemical plant (to which the above heuristics relate). In one experiment, there were three training conditions: (a) no story, where subjects were not told anything about how the plant worked, (b) a ‘theory’ condition, where an explanation was given in simple language of the plant’s basic workings (i.e., inputs and outputs, the intervening flow paths, the drives, control loops, and so on), and (c) a condition in which diagnostic rules were incorporated. In addition, the subjects were tested on both faults that they had encountered in training (old faults), and ones they had not met before (new faults).
The mean number of correct diagnoses did not differ between the training conditions for the old faults, but there was a marked advantage for the rules condition in the case of new faults. A subsequent study demonstrated that a combination of withholding plant information and diagnostic rules increased the correct diagnosis rate of new faults to a level comparable to that for old faults.
These and other findings from this group have important training implications for installations that rely heavily upon written procedures, often involving elaborate algorithms, to guide operators’ diagnoses for fault conditions. These detailed branching structures have, as Duncan (1987, p. 210) points out, the “intrinsic limitation that, by definition, an algorithm will only distinguish the set of conditions which could have been foreseen. If an unforeseen event occurs, the operator is not helped by algorithmic procedures.”
Indeed, as the North Anna incident (Pew et al., 1981) showed, operators can on occasions be seriously hampered by the requirement to follow mandatory procedures—in this case, the post-TMI stipulation that the safety injection must be left on for at least 20 minutes after a reactor scram. Fortunately, they decided to disobey the regulation and turned off one of the two emergency cooling pumps for 4 minutes. This incident highlights the dangers of an overly prescriptive approach to abnormal plant states.
5.4.2. Simulator training
Simulators are undoubtedly useful for providing error data and as basic research tools (see Woods, 1984; Norros & Sammatti, 1986). But can they provide operators with generalized recovery skills? Before we can adequately answer this question, we need first to confront two more immediate problems. How do you simulate events that have never happened, but might? Still worse, how can you simulate events that have not been foreseen (see Duncan, 1987)? These problems were touched upon earlier in the context of the ‘Catch-22’ of human supervisory control (see Chapter 7). One very clear constraint is that any attempts at simulation must recreate the dynamic and interactive nature of an accident sequence. Static simulations cannot capture the problems operators experience in ‘tracking’ the current plant state (Woods, 1986).
So far we have no satisfactory answer to these questions. But Duncan (1987, p. 266)) was probably correct when he stated that: “[Simulator] training may succeed in providing operators with generalizable diagnostic skill but there are limits to what may be achieved, and post-training probabilities of diagnostic error remain uncomfortably high.”
5.4.3. Error management
This is a procedure being developed by Michael Frese and his FAUST (Fehler Analyse zur Untersuchung von Software und Training) group at the University of Munich (Frese, 1987; Frese & Altmann, 1988) from empirical research on errors in human-computer interaction. They note that errors committed in training can have both positive and negative effects. The aim of error management is to promote the positive and to mitigate the negative effects of training errors in a systematic fashion.
The benevolent aspects of training errors derive mostly from the opportunities they provide for further learning about the system. In one training study concerned with word processing, subjects who were denied the opportunity to commit errors performed worse than other groups who were allowed to make errors.
Errors serve different kinds of useful function depending upon the level of performance. At the level of abstract thinking, errors can help the trainee discriminate between those metacognitions that work from those that do not. For example, if a novice thinks of a word-processing system as a typewriter model, the boundaries of this model become apparent when he finds he cannot write over a blank space in the insert mode. Errors also lead to the conscious reappraisal of action patterns that are normally controlled by low-level processors. As such, they delay the premature automatisation of a new skill—so long as adequate feedback is provided. Moreover errors can spur creative problem solutions and new exploratory strategies. Thus, if the trainee has not appreciated the difference between, say, the ‘insert mode’ and ‘overwrite mode’ in word processing, then errors resulting from this lack of knowledge can lead him or her to explore these modes spontaneously. Likewise the unintended use of a command can provoke useful curiosity as to what its proper range of functions are.
The negative aspects of training errors have to do, in large part, with the trainee’s motivation and self-appraisal. The feedback provided by training errors has two components: informational and affective. If a largely affective interpretation is placed upon the feedback, the trainee may come to regard himself as too incompetent ever to succeed. Errors, particularly mistakes and grave slips (like deleting an important file), can lead to self-blame and additional stress. And even when the motivation to proceed is not seriously impaired, the anxiety provoked by this sense of stupidity can reduce the training to an ordeal. Stress and anxiety increase the cognitive load upon the trainee, which in turn promotes the occurrence of further errors. And, perhaps most importantly, much of the aversiveness of errors in human-computer interactions derives from the fact that slips and mistakes often leave the trainee in a situation where he or she can neither go forward nor backtrack.
These observations, based largely upon attempts to learn word-processing systems, lead to a set of error management principles that are applicable to a wide range of training situations, particularly in more complex systems. They are summarised below:
(a) Training should teach and support an active, exploratory approach. Trainees should be encouraged to develop their own mental models of the system and to use ‘risky’ strategies to investigate and experiment with still untaught aspects. Proper error management is not possible when training is structured according to programmed learning principles or when the trainee has to follow instructions to the letter.
(b) Error training should form an integral part of the overall training process. This means that the trainee should have the opportunity to both make errors and recover from them. This can include such devices as asking trainees to follow on from mistakes made by others. Error training is less aversive if trainees work in pairs. Clearly, strategies for dealing with errors have to be taught as well as discovered.
(c) Most adults approach training with the belief that errors are undesirable. Moreover, they
do not like to be made to feel stupid. To counteract this, it is helpful to present heuristics (It is good to make mistakes: they help learning’, etc.). The goal of such heuristics is to change the attitude of trainees from ‘I mustn’t make errors’ to ‘let me see what I can learn from this error’.
(d) Error training should be introduced at the appropriate point. In the beginning, trainees have to struggle consciously with every step; this means that they are working to the limits of their capacity and that error training would be inappropriate at this stage. Some studies (Carroll & Carrithers, 1984; Carroll & Kay, 1985) have demonstrated that denying learners error feedback at this early stage can have beneficial effects. Error training is probably best introduced in the middle phase of the programme. This reduces the initial overload on the complete novice and gives the more advanced trainee the opportunity of exploiting his or her earlier experiences.
If human errors were truly stochastic events, associated with variability in recall, reasoning and the control of movement, then training should act to reduce errors through the structured diminution of this intrinsic variability. However, while it is true that some errors do fall into this category, most errors take systematic forms that are rooted in generally adaptive cognitive processes. Such errors are an intrinsic part of mental functioning and cannot be eliminated by training, no matter how effective or extensive the programme may be. It is now widely held among human reliability specialists that the most productive strategy for dealing with active errors is to focus upon controlling their consequences rather than upon striving for their elimination. Such an approach is discussed below.