The attempt to achieve such a consensus was led by the World Health Organization. On its creation in 1948, WHO had assumed responsibility for an International List of Causes of Death, which had been first compiled in 1853, and which had undergone four major revisions sponsored by the French government. Expanding the manual to include non-fatal diseases, WHO published a sixth edition (now renamed the International Classification of Diseases, or ICD-6 for short) in 1951, but was disappointed by its impact. The mental disorders section of the manual was adopted for official use only in Finland, Great Britain, New Zealand, Peru and Thailand and, even in these countries, its use by practising psychiatrists was fairly haphazard. A committee chaired by Erwin Stengel, a British psychiatrist, was therefore mandated to consider how the section on psychiatric disorders might be improved in future editions. Stengel became convinced that aetiological prejudices were largely responsible for the difficulty in achieving agreement. Some diagnostic concepts then in wide use seemed to imply that disorders had particular causes. For example, many psychiatrists believed that the term ‘schizophrenia’ implied an endogenous disorder (that is, a disorder caused by some biological dysfunction within the individual), whereas other widely used terms, such as ‘reactive depression’, seemed to imply that disorders were largely caused by environmental factors. For these reasons, Stengel embraced an idea originally suggested by the American philosopher Carl Hempel, and proposed that future classifications of psychiatric diseases should include only operational definitions, which is to say definitions that precisely specified the rules to be followed when making each diagnosis. DSM-I, with its relatively clear definitions and its thumbnail descriptions, was the closest he could find to this ideal approach.
Stengel’s advice that diagnoses should make no reference to aetiology was followed for the eighth edition of the International Classification of Disease, which was published in 1965 and officially adopted by WHO in 1969. ICD-8 was the product of an unusual degree of co-operation between psychiatrists in different countries. Scandinavian and German psychiatric societies supported the new taxonomy, and the American Psychiatric Association agreed to base a revision of their Diagnostic and Statistical Manual on the ICD-8 system. Accordingly, in 1965 the APA appointed a small committee of eight experts and two consultants, and DSM-II was published three years later as a spiral-bound notebook, 150 pages in length, available to clinicians for $3.50.
Troublesome data
By the middle decades of the twentieth century, it was becoming obvious to many psychiatrists that the achievement of a consensus about the main features of each psychiatric disorder would not be enough to ensure that diagnoses were reliable, let alone scientifically meaningful. Empirical research was required. Unfortunately, the earliest studies of diagnostic reliability proved to be discomfiting to those who wished to believe that Kraepelin had at last discovered the correct taxonomy of psychiatric disorders.
Jules Masserman and H. Carmichael, two American psychiatrists, published the first study of the reliability of psychiatric diagnoses, in 1938.9 These researchers followed up 100 patients one year after they had been admitted to a university psychiatric clinic in Chicago, in order to see whether their diagnoses had changed during that period. They noted that the majority of patients had symptoms that seemed to fit more than one diagnostic category. By the end of the follow-up interval, 40 per cent of the patients had been assigned diagnoses different from those they had been given on admission. Although it might be argued that these disappointing findings were merely a reflection of the poor diagnostic skills of particular clinicians in a particular city at a particular time, later studies established that this was not the case. For example, similar results were obtained in a much larger investigation conducted by three US navy psychiatrists, William Hunt, Cecil Wittson and Edna Hunt, published in 1953.10 Nearly 800 enlisted men discharged from naval boot camps for psychiatric reasons were followed up in the hospitals to which they were discharged and their diagnoses there were compared with those they had been given by the navy psychiatrists. There was agreement about specific diagnoses in only 32.6 per cent of cases.
Of course, diagnoses may change over time for genuine reasons (because patients’ symptoms change), so unstable diagnoses are not incontrovertible evidence of diagnostic destitution. Of much greater importance is the extent to which different clinicians assign the same diagnoses to the same patients when conducting assessments at the same point in time. This question was first addressed in a pioneering investigation conducted by an American postgraduate student of industrial psychology, Philip Ash, which was published in 1949.11 Three psychiatrists interviewed fifty-two male patients attending an outpatient clinic and the results, at least on first sight, were even less impressive than those obtained by Masserman and Carmichael. All three psychiatrists agreed on diagnoses in only 20 per cent of cases, whereas two out of three agreed about 48.6 per cent of cases. These figures improved to 45.7 per cent and 51.4 per cent respectively when only major diagnostic categories (for example, psychosis) were considered, but even then fell well short of the desired level of agreement. Unfortunately, as later commentators pointed out, Ash’s study suffered from a defect that made interpretation of his findings difficult. Three quarters of the major diagnoses given were ‘normal with a predominant personality characteristic’. On the one hand, it might therefore be argued that the psychiatrists were being asked to make judgements of personality rather than illness. On the other hand, with such a high base rate for a single category, relatively high rates of agreement were bound to occur by chance. (To see that this is the case, imagine what would happen if only one diagnosis was employed. In these circumstances there would inevitably be 100 per cent agreement between different clinicians. Now imagine that there is a choice between only two diagnoses. Even under these circumstances two psychiatrists deciding diagnoses by a toss of a coin would agree on 50 per cent of occasions. The chance level of agreement therefore decreases as the base rate of each diagnosis decreases, and as the number of available diagnoses increases.) It is for this reason that simple percentage agreement figures usually give an inflated impression of diagnostic agreement.
Several other early studies of diagnostic agreement were less affected by these limitations, and yet produced results that were little better.12 For example, a new standard in the field was set in a report published by Myron Sandifer, Charles Pettus and Dana Quade in 1964. Sandifer and his colleagues studied the diagnoses assigned to first-time psychiatric patients admitted to three hospitals in North Carolina.13 Ninety-one patients were interviewed by a psychiatrist in front of a team of experienced clinicians, also mostly psychiatrists. Levels of agreement between the interviewing psychiatrists and their colleagues varied substantially across the different diagnoses. Some diagnoses fared reasonably well. For example, 74 per cent of the observers agreed when the interviewer assigned a diagnosis of schizophrenia. Others performed very badly. For example, only 36 per cent of the observers agreed when a diagnosis of manic depression was assigned. (Of course, these figures do not correct for the base rate problem.)
Worlds apart
At about the same time that these studies were drawing attention to difficulties in ensuring agreement between clinicians, other studies were pointing to substantial differences in diagnostic practices in different countries. The first detailed study of this phenomenon was published by Morton Kramer in 1961.14 Kramer compared the diagnoses given to patients when first admitted to hospitals in England, Wales and the United States, finding that the diagnosis of schizophrenia was used much more frequently in the United States than in Great Britain. The opposite appeared to be true for manic depression.
These observations stimulated a substantial study of transatlantic differences in diagnostic practices, which became known as the US–UK Diagnostic Project, and which was conducted by a team of British and American psychiatrists. In a significant advance on Kramer’s approach, the team examined patients in the two countries using a struct
ured interview schedule (to ensure that all patients were asked the same questions) and assigned them ICD-8 diagnoses.15 This procedure largely eliminated the transatlantic differences in the rates at which various diagnoses were given. For example, when local diagnoses were examined, 61.5 per cent of New York patients but only 33.9 per cent of London patients were given a diagnosis of schizophrenia. However, when ICD-8 diagnoses were assigned, 29.2 per cent of New York patients and 35.1 of London patients were given the diagnosis. These findings were important because they implied that different diagnostic practices (rather than actual differences between the patient populations) were responsible for the transatlantic differences that had previously been observed.
This conclusion was supported by a further study conducted by the team, in which large groups of British and American psychiatrists were shown videotaped interviews with patients. When the diagnoses given by the participating psychiatrists were studied, it was again found that the Americans in comparison with the British were much more willing to make a diagnosis of schizophrenia and much less willing to make diagnoses of mania, depression or neurosis. These differences were summarized by the project team in the form of a Venn diagram, reproduced in Figure 3.1, which shows how the American concept of schizophrenia overlapped the other diagnostic categories used more frequently in Britain.
Figure 3.1 Venn diagram showing relationships between US and UK diagnostic concepts as revealed in the US–UK Diagnostic Project (from R. E. Kendell, J. E. Cooper, A. J. Gourlay, J. R. M. Copeland, L. Sharpe and B. J. Gurland (1971) ‘Diagnostic criteria of American and British psychiatrists’, Archives of General Psychiatry, 25: 123–30).
The World Health Organization subsequently conducted a much broader investigation of international discrepancies in diagnostic practices, which became known as the International Pilot Study of Schizophrenia (IPSS).16 A total of 1202 recently ill patients were recruited from Columbia, Czechoslovakia (now divided into the Czech Republic and Slovakia), China, Denmark, India, Nigeria, the then Soviet Union, the UK and the USA. Three different approaches were made to assigning diagnoses. First, patients were diagnosed according to the customs and practices of the local psychiatrists. Second, they were given an ICD-8 diagnosis on the basis of their responses to a standardized psychiatric interview. Third, a computer program was used to identify a core group of patients who appeared to have a common set of schizophrenia symptoms.
When presenting its findings, the WHO drew attention to the unsurprising fact that a ‘concordant group’ of patients identified as suffering from schizophrenia by all three methods were very alike, no matter which country they came from. This observation was, of course, a consequence of the way in which the study was designed (as these patients were those who everyone agreed had schizophrenia it was hardly surprising that they had similar symptoms). Of more interest were the discrepancies observed between the different approaches to diagnosis. Large numbers of patients who were diagnosed as suffering from schizophrenia according to the local criteria failed to fall into the concordant group. Moreover, it was observed that local concepts of schizophrenia appeared to be especially broad in the USA (confirming the earlier findings of the US–UK Diagnostic Project) and in the USSR.
The apparent over-diagnosis of schizophrenia in Moscow (the centre for the USSR component of the study) merits further examination. At the time of the IPSS, Soviet psychiatrists’ attitudes towards schizophrenia were shaped by the views of Andrei Snezebryakova, an ambitious Communist Party member who had advocated a politically favoured Pavlovian approach to mental illness. Snezebryakova had been appointed a full member of the prestigious Academy of Medical Sciences in 1962. At the same time, he had become Director of the Academy’s Institute of Psychiatry, a position of considerable power in the Soviet system, as it made him chief adviser in psychiatry to the Ministry of Health. He had therefore been able to determine the training given to junior psychiatrists, their progress in the profession, and the policies that they would pursue when working in hospitals.
Snezebryakova had an extremely broad conception of schizophrenia, and suppressed views that differed from his own. He believed that the disorder appeared in a spectrum of guises ranging from idiosyncratic thinking through to full-blown psychosis, and that it could develop gradually rather than suddenly, so that its onset was often indistinguishable from eccentric behaviour. Moreover, for Snezebryakova, dissatisfaction with the Soviet political and economic system, or with communism in general, was itself evidence of mental instability. The consequence of this line of reasoning was a unique classification of schizophrenia according to both course (whether the illness was continuous, periodic or intermittent) and severity, as shown in Table 3.1. Symptoms of the mildest forms of schizophrenia were said to include ‘neurotic self-consciousness’, ‘conflicts with parental and other authorities’, ‘reformism’, ‘social contentiousness’ and ‘philosophical concerns’. Small wonder that, by the late 1960s, individuals who were regarded as political dissidents in the West were being forcibly treated for schizophrenia in the USSR, notably in the notorious Serbsky Institute of Forensic Psychiatry in Moscow.17 In a commentary on this apparent abuse of psychiatry, American psychiatrist Walter Reich noted that Soviet psychiatrists, far from acting as conscious agents of the state, were often acting in good faith:
Those Soviet psychiatrists really saw the patients as schizophrenic; or, to put it another way, the system created a category, first on paper and then, with training, in the minds of Soviet psychiatrists, which was eventually assumed to represent a real class of patients and which was inevitably filled by real persons… Had those psychiatrists been sensitive to the capacity of diagnostic systems to shape the way psychiatrists understand, categorise, and perceive psychopathology, they might have been able, one hopes, to avert this result.18
Rethinking agreement
To many psychiatrists working in the mid-1970s, the findings from these international comparisons, together with the discouraging results from the earlier reliability studies, signalled nothing less than a crisis in the theory and practice of psychiatry. Some clinicians of this era advocated abandoning the practice of assigning diagnoses, on the grounds that they were meaningless and dehumanizing to patients (we will examine some of these arguments later). Others, however, believed that the crisis was caused by an unrigorous and insufficiently medical approach to mental illness and that the problems of classification might still be solved. A leading exponent of this conservative position was Robert Spitzer, a psychiatrist who would influence American attitudes towards diagnosis in the second half of the twentieth century as much as Adolf Meyer had done in the first.
Spitzer was an unlikely candidate for this role. Born in White Plains, New York, in1932, he studied psychology at Cornell University before
Table 3.1 The Moscow school criteria for schizophrenia employed in the Soviet Union in the early 1970s (adapted from W. Reich (1984) ‘Psychiatric diagnosis as an ethical problem’, in S. Bloch and P. Chodoff (eds.), Psychiatric Ethics. Oxford: Oxford University Press).
Course Forms
Types
Continuous
Periodic
Shift-like
Life course of the illness
Steady deterioriation
Intermittent episodes
Intermittent episodes, superimposed on a steady deterioration
Subtypes
Sluggish (mild)
Paranoid (moderate)
Malignant (severe)
Moderate
Severe
Some characteristics
Neurotic self-consciousness; introspectiveness; obsessive doubts; conflicts with parental and other authorities
Paranoid delusions; hallucinations; parasitic life-style
Early onset; unremitting; overwhelming
Acute attacks; fluctuations in mood; confusions
Neurotic, with affective colouring; social contentiousness; philosophical concerns; self-absorption
Acute para
noid
Catatinia; delusions; promonent mood changes
completing his MD at New York University in 1957.19 While training at the New York State Psychiatric Institute, he attended the Columbia University Psychoanalytic Clinic for Training and Research, and qualified as a psychoanalyst. It was after completing this training that he was appointed to a research fellowship, which brought him into contact with epidemiologically sophisticated researchers at the New York State Psychiatric Institute, most notably Joseph Fliess, a statistician, and Joseph Zubin and Jean Endicott, both psychologists. His experience in this position led him to become one of the consultants to the APA’s Task Force on Nomenclature and Statistics, which introduced D S M -II. Indeed, Spitzer co-authored the article in the American Journal of Psychiatry which introduced and defended the new manual.20
In a paper published with Joseph Fliess in 1974, Spitzer attempted an empirical summary of the art of psychiatric diagnosis at that point in time.21 A novel aspect of their treatment of the reliability problem was their use of a relatively simple mathematical procedure to compensate for the base rate problem (that is, to allow for the extent to which randomly assigned diagnoses would be in agreement by chance, especially for frequently used diagnoses). The measure of reliability that they advocated, known as Cohen’s kappa, gave a value between 0 for chance agreement and 1 for perfect agreement, with values above 0.7 being regarded as acceptable.* Using this statistic, Spitzer and Fliess were able to reanalyse data from the six best reliability studies then available, including the US-UK Diagnostic Project (for which they reported separate analyses of the New York and London data). Their findings are summarized in Table 3.2.
Madness Explained Page 7