Human Error

Page 23

by James Reason

Another interesting finding was that phonological errors of all kinds were corrected at least as often as lexical errors. This tells us something about the criteria for detection and correction. Nooteboom (1980, p. 89) states:

If possible harm to communication were the main criterion for correction, one would expect lexical errors to be corrected far more often than phonological errors. If, on the other hand, possible harm to linguistic orthodoxy were the main criterion, one would expect that phonological errors, which often lead to nonwords, would be corrected more often. If there is a difference it is rather in favour of the second possibility, but apparently the mental strategy (Laver’s Monitor) dealing with the detection and correction of overt speech errors strives both for successful communication and linguistic orthodoxy. The fact that not all errors are corrected indicates that the output editing is not perfect. Two factors are likely to be in competition immediately following the detection of a speech error: (a) an urge to correct the error immediately, and (b) an urge to complete the word being uttered. When the detection occurs before the next word boundary, the first force sometimes prevails; but when detection occurs later than this, the second force always overcomes the first. No stop occurs in the middle of a word, and the chance of detection after about five words is nil (Nooteboom, 1980).

In an ingenious set of laboratory studies designed to investigate artificially-induced spoonerisms, Baars, Motley and MacKay (1975) and Baars (1980) observed that people could be made to produce lexically possible spoonerisms (e.g., saying ‘barn door’ for ‘darn bore’) significantly more often than transpositions resulting in nonwords (‘bart doard’ for ‘dart board’). They also noted that it was hard to get people to produce salacious spoonerisms (i.e., they show a marked reluctance to spoonerise phrases like ‘fuzzy duck’ or ‘plucking pheasants’).

These findings suggest that we edit spoken output at a number of levels, and that errors obeying lexical rules are more likely to evade the scrutiny of the ‘editor’, particularly if the latter is occupied elsewhere. Except, that is, when the lexically appropriate output is socially unacceptable. But, as most of us know, embarrassing speech errors still occasionally emerge, indicating that sufficient ‘custodial attention’ is not always available (see Chapter 3 and Reason & Lucas, 1984b).

2.4. The detection of action slips

As argued in Chapter 3, the occurrence of many actions-not-as-planned result directly from the failure of high-level attentional monitoring. Although the tasks in which these moments of inattention occur are mostly run off automatically under feedforward control, some attentional investment is needed at intervals to ensure that the actions conform to current intentions, particularly when they demand a deviation from routine practice. This means bringing the conscious workspace into the control loop momentarily to check that things are running as intended. Even notoriously abstracted individuals like Archimedes or G. K. Chesterton would have needed to perform these checks fairly often in order to live anything like normal lives. In short, the process by which slips are detected has already been presented in the earlier discussion of their causes. Slips occur through the absence of necessary attentional checks and can be detected by their later occurrence.

But this is not the whole story. Making a postslip attentional check does not of itself ensure the detection of the error. Detection must also depend upon the availability of cues signalling the departure of action from current intention. When the slip involves pouring tea into a sugar bowl, the evidence is immediately apparent. But when it is something like doubly salting a stew, the error may remain undetected for a long time, particularly if the stew is destined for the freezer. The same is also true for omissions. We only realise that something we had intended to bring from home to the office was forgotten when we open our brief-cases on the following morning. As will be seen in the next chapter, undetected errors of this kind make a major contribution to system catastrophes.

Unlike speech errors, the possibility of detecting slips of action extends well beyond actual occurrence of the error. At one extreme, they may be ‘caught in the act’, as in Norman’s (1980) example: “I was about to pour the tea into the opened can of tomatoes that was just next to the teacup Not yet an error—but certainly a false movement”; while, at the other extreme, discovery may be delayed for days, weeks or months, as in the case of omitted actions. Slips are apparently detected at many levels, ranging from immediate feedback processes embedded within the action schemata (see Norman, 1981; Reason & Mycielska, 1982; Norman & Shallice, 1986) to conscious and effortful feats of memory performed after a long delay.

2.5. The detection of errors during problem solving

So far, we have been considering the detection of errors at the skill-based level of performance where any discrepancy between the current and desired state is, in principle at least, fairly easy to determine. Thus, the spatial senses (particularly the otoliths and peripheral vision) are specifically designed to register significant deviations of the body from the upright position; for speech and routine action, the goal is represented by some internally formulated intention. In the latter case, errors may go undetected if the cues signalling a departure of current action from intention are insufficiently salient and/or if the intention itself is underspecified (Norman, 1981). Usually, however, both of these conditions are satisfied, so that most speech or action slips are discovered by their perpetrators. But this is clearly not the case at the rule-based and knowledge-based levels. For the moment, we will take ‘problem solving’ as a blanket term to cover the various activities involved at the RB and KB performance levels (i.e., reasoning, judgement, diagnosis and decision making).

In the skill-based activities considered hitherto, the criteria for successful performance are, to a large extent, directly available within the head of the individual, ranging from automatic body tilt indicators and their associated corrective reflexes to an awareness of what is being done and what is currently intended. For problem solvers, however, the correct solution may only be present in the external world.

In knowledge-based performance, an adequate path to a desired goal is something that lies ‘out there’, waiting to be discovered by the problem solver. Aside from inspired guesses, the only way forward is by trial and error, where success depends upon (a) defining the goal correctly, and (b) being able to recognise and correct deviations from some adequate path towards that end. These are the strategic and tactical aspects of problem solving, and each has different implications for error detection. It is likely to be much harder to discover a strategic mistake (selecting the wrong goal) than a tactical one (taking the wrong path), since the feedback information regarding the former will be less readily specified and interpreted than that relating to the latter for several reasons. First, the success of strategic decisions can only be judged over a much longer time scale than tactical ones, and then only in reference to some superordinate or more distant goal. Second, the criteria for success or failure can often only be judged with the benefits of hindsight. Third, the identification of a goal constitutes a theory about the future state of the world and is thus subject to the ‘blinkering’ of confirmation bias and anxiety reduction. So, not only is there less objective information upon which to base an adequate strategic decision, there are also powerful subjective influences at work to restrict the search for cues bearing upon the inadequacy of this choice. It follows from this, therefore, that the task of error detection will be easier in those problems for which the correct solution is clearly recognisable in advance, so that present performance is judged only at the tactical level.

2.5.1. The Swedish studies

One of the most detailed investigations of error detection during problem solving was carried out by Allwood (1984) of the University of Gothenberg. He asked subjects to think aloud while attempting to solve statistical problems. For the purposes of analysis, this task was divided into two phases: (a) a progressive phase, when the subject works towards the goal state o
f the problem, and (b) an evaluative phase, when the subject checks upon some completed part of the problem. The latter may be either affirmative (the subject is satisfied with current progress) or negative. Error detection always occurs during negative evaluations and involves two stages: the triggering of the error detection mechanisms and later steps taken to discover and correct the error.

Analysis of the subjects’ verbal protocols suggested that negative evaluation episodes may be classified into three types:

(a) Standard check (SC). These are initiated independently of the specific properties of the previous work: the subject simply decides to carry out a general check on progress.

(b) Direct error-hypotheses formation (DEH). These episodes are triggered by an abrupt detection of a presumed error. They need not occur immediately after the error was made, nor do they necessarily discover actual errors.

(c) Error suspicion (ES). Here the subject notices something unusual, and suspects that an error has been made. Whereas with the DEH mode, the subject’s remarks always relate to a specific, though possibly only presumed, error; ES episodes relate to some property of the produced solution without initially identifying the precise cause for concern.

These findings support two kinds of theories concerning error detection. The SC episodes, identified in the protocols, are evidence for the centrally-invoked mechanisms suggested by Hayes and Flower (1980), Sussman (1975) and Allwood and Montgomery (1982). These are not initiated by any feature of the problem solution, rather they emerge as a characteristic part of the subject’s problem-solving technique. Other theories about error detection emphasise the spontaneous, data-driven nature of the process. Errors may be discovered either because of a match between stored representations of past errors and currently observed ones (see Hayes & Flower, 1980; Sussman, 1975), or the detection process may be triggered between the subject’s general expectations and the results of his or her problem-solving efforts (see Baars, 1980; Carpenter and Daneman, 1981). DEH episodes conform to the first kind of triggering, while ES episodes are compatible with the second kind.

Taken overall, Allwood’s results may be summarised as follows:

(a) Only one-third of the undetected errors were relevant to some evaluation phase. Subjects clearly had difficulty in reacting to the effects of their errors.

(b) Among the various kinds of evaluation, DEH and ES episodes occurred most frequently. Ninety-five per cent of the former and 66 per cent of the latter were triggered by erroneous solution parts.

(c) Execution errors (slips) were detected far more readily than solution method errors (mistakes). A much higher proportion of the slips were detected by DEH episodes, which did not seem to be particularly well suited to picking up mistakes. The results also highlighted the importance of ES episodes for the detection of solution method errors.

(d) The chances of a successful error detection occurring during ES episodes diminished rapidly as time elapsed between the error and the episode. This effect was more apparent for execution slips than for solution method errors.

2.5.2. The Italian studies

In two separate studies, Rizzo and his collaborators (Rizzo, Bagnara & Visciola, 1986; Bagnara, Stablum, Rizzo, Fontana & Ruo, 1987) examined the relationship between the three basic error types (outlined in Chapter 3) and the three self-monitoring detection processes discussed above (Allwood, 1984).

In the first study, 16 ‘naive’ subjects were trained to use a database system (Appleworks) in two sessions. A third session instructed them in the ‘talk aloud’ technique to be used in the experiment proper. Subsequently, there were four successive experimental sessions, each more complex than the last. These sessions involved (a) finding a given item and reporting its values, (b) finding three items and changing their values, (c) creating a new file from information already present within the database and (d) creating three new files and printing them out. No time limits were imposed upon these activities. Overall, the subjects made 924 errors and detected 780 of them.

Most skill-based slips were detected during DEH episodes, whereas the bulk of the knowledge-based mistakes were picked up during ES episodes. Rule-based mistakes, however, were discovered primarily by either DEH or ES episodes.

A very similar pattern of error types and detection modes was observed in a second study carried out in a steel works. Eight experienced operators were required to carry out a simulated production planning exercise relating to a hot strip mill. The subjects’ work was videorecorded and subsequently analysed by the experimenters and by task experts. The subjects made 95 errors and detected 74 of them.

In both studies, a reasonably consistent association was found between error types and detection modes. Slips were detected mainly by direct error hypothesis episodes; rule-based mistakes by a mixture of DEH and error suspicion episodes; and knowledge-based mistakes were discovered largely as the result of standard check behaviour. These findings provide further support for the differentiation of the three basic error types set out in Chapter 3, and for Allwood’s categorisation of self-monitoring processes. They also carry important implications for designing future systems that provide the maximum opportunity for error detection and recovery.

3. Environmental error cueing

3.1. Forcing functions

The most unambiguous way by which the environment can inform us that we have made an error is to block our onward progress. If we have failed to turn the appropriate keys in all the locks or not drawn all the bolts, the door will not open. If we attempt to drive away a car without first switching on the ignition, it will not move. If we have not made all the appropriate connections in the wiring of an electrical applicance, it will not work. These are what Norman has termed forcing functions: “something that prevents the behaviour from continuing until the problem has been corrected” (Lewis & Norman, 1986, p. 420).

The existence of appropriate forcing functions guarantees error detection. Sometimes they are a natural property of the task (as in the case of the locks and bolts). In other cases, they are deliberately built in by the system designer. Most word processing packages, for example, will not allow you to return to the operating system until you have saved the current text file. Some computer manufacturers go one stage further and physically prevent the removal of the diskette until the file has been saved. Note that in both cases, however, these forcing functions are not proof against the user simply switching off the computer prematurely.

A harsh fact of life, and one that contributes to a large proportion of maintenance errors (see Chapter 7), is that there are usually many more forcing functions available in dismantling a piece of equipment than there are in reassembling it. In stripping down a tap or a part of an engine to its component parts, each step in the process is cued by the physical characteristics of the item. It is simply not possible, for example, to remove the washer from most taps without first taking off the handle. Unfortunately, these cues are hardly ever present when it comes to putting the pieces back together again, hence, the ‘garage floor phenomenon’, captured by cartoons in which the oil-covered male figure turns to the woman saying ‘That’ll fix it’, and she replies, eyeing the ground beneath the car, ‘How clever of you to save all those pieces’.

Like good advice, a forcing function is most valuable if it does not come too late. Forcing functions encountered long after the commission of an error provide little diagnostic help. Indeed, they may actually promote the occurrence of further errors. The process of back-tracking from a forcing function creates additional opportunities for deviation and can lead to total confusion on the part of the fault finder.

People’s reactions to forcing functions are not always entirely rational. How many of us, for example, have stood continuously rattling the handle of a door we have failed to unlock? Some individuals see forcing functions not as an indication of past correctable errors, but as a physical barrier to be overcome by other more direct means: jumping over, driving through, or something— quite literally
—to be forced open. Responses to forcing functions, particularly those designed to guide emergency action (i.e., barriers across basement stairs blocking a non-escape route), are clearly worth closer investigation.

In the course of their investigation of error detection in steel mill operators, Bagnara and coauthors (1987) examined the extent to which forcing functions contributed to the discovery of various error types. They identified three levels of mismatch or ‘failed expectation’: (a) forcing functions, when errors lead to a blocking of further progress; (b) external feedback, when information relating to the error is available in the environment; and (c) internal feedback, when error-related information is available within the subject’s working memory, but not in the environment.

It was found that the efficiency of forcing function mismatches was greater for the recovery of slips than it was for mistakes, particularly knowledge-based ones. In the latter case, the operators had considerable difficulty in moving from mismatch detection to appropriate diagnosis to effective error recovery. This, in turn, derived from a complex interaction between the inadequacy of the operators’ system knowledge (even though their working experience ranged from 13 to 17 years) and the inflexibility of the system itself. We will return to the difficulty of recovering knowledge-based mistakes later on in this chapter.

‹ Prev Next ›