by Simon Levay
The children at the orphanage – a mix of real orphans and children whose parents had been forced to give them up by economic necessity – were used to being treated as guinea pigs. When Jim Dyer, a reporter, interviewed one of the now-elderly subjects for a 2001 article in the San Jose Mercury News, she told him that ‘Every week, somebody else from the university would come down and start testing us for God knows what.’ There is no record of Wendell Johnson’s motive for choosing an orphanage for the study, but we may guess that it was twofold: first, the convenience of having a large, fairly homogeneous collection of children at a single location and cared for by the same staff, and second, the ease of obtaining permission for the study. It would have been much harder, one may guess, to get permission from a child’s parents, given that one possible outcome of the treatment was the development of a speech disorder.
The first day of Mary Tudor’s visit to the orphanage was devoted to the selection of children for the study. There were 10 so-called stutterers in the orphanage – children who the teachers and matrons considered to be stutterers and had labelled as such. All 10 of these children were included in the study. To balance them, Tudor and her five colleagues – fellow graduate students who were familiar with speech disorders – picked 12 children at random from the remaining population of children who had never been called stutterers by the staff. The 22 children selected for the study included both boys and girls, and their ages ranged from five to 16 years.
Each of these two groups was then further divided into two, thus providing the four subject groups needed for testing the four objectives described above. Tudor named the groups IA, IB, IIA, and IIB, but for ease of recall I’ll rename them as follows:
SN: Five children previously labelled as stutterers who were to be relabelled as normal speakers.
SS: Five children previously labelled as stutterers who would continue to be labelled as such.
NS: Six children previously labelled as normal speakers who were to be relabelled as stutterers.
NN: Six children previously labelled as normal speakers who would continue to be labelled as such.
In her thesis, Tudor maintained her subjects’ confidentiality, only referring to the individual children by code numbers. This confidentiality was breached by the Mercury News reporter Jim Dyer, however. The names of the six children in the NS group (the most ethically questionable group, consisting of normal-speaking children who were to be relabelled as stutterers) have also entered the public domain on account of the lawsuit against the state of Iowa in which they or their heirs are plaintiffs, and I will therefore use their names here. They were Norma Jean Pugh (aged five at the time), Elizabeth Ostert (nine), Clarence Fifer (11), Mary Korlaske (12), Phillip Spieker (12) and Hazel Potter (15).
The ages of these children immediately raise a significant issue with regard to the scientific value of the study. Stuttering typically begins in the preschool years; if Wendell Johnson began stuttering at five, as he related, then he was among a minority of late-onset stutterers. The children in Mary Tudor’s NS group, with the possible exception of Norma Jean Pugh, were well beyond the age at which stuttering typically develops. Thus, even if Johnson’s diagnosogenic theory were correct, Tudor’s study might have failed to validate it simply because the children had grown past the sensitive period of speech development during which they could be induced to stutter. Tudor did not discuss this issue in her thesis. It may be that she was forced to use older children because there was an insufficient number of younger ones in the orphanage. Alternatively, she may have felt compelled to use children in the same age range as those in the stuttering groups, who averaged 12 years of age.
The plan of the study was as follows. At the beginning and again the end of the study, the speech of all 22 children was to be assessed by the panel of five judges. Without knowledge of which experimental group each child belonged to, the judges would independently provide a numerical assessment of the child’s fluency and would also make a judgment as to whether the child stuttered or not. During the intervening four months, Tudor would apply labels to the children according to the groups they had been assigned to.
This is how Tudor’s thesis describes what was to be said to the children in the NS group at the beginning of the study. (Her actual words were modified to suit each child’s age and intelligence; some of the children had IQs that were well below average.)
The staff has come to the conclusion that you have a great deal of trouble with your speech. The type of interruptions which you have are very undesirable. These interruptions indicate stuttering. You have many of the symptoms of a child who is beginning to stutter. You must try to stop yourself immediately. Use your will power. Make up your mind that you are going to speak without a single interruption. It’s absolutely necessary that you do this. Do anything to keep from stuttering. Try very hard to speak fluently and evenly. If you have an interruption, stop and begin over. Take a deep breath whenever you feel you are going to stutter. Don’t ever speak unless you can do it right. You see how [the name of a child in the institution who stuttered rather severely] stutters, don’t you? Well, he undoubtedly started this very same way you are starting. Watch your speech every minute and try to do something to improve it. Whatever you do, speak fluently and avoid any interruptions whatsoever in your speech.
The children in the SN group were told the opposite – that they didn’t stutter, that any speech mistakes they made were inconsequential and that they should not worry about them. The children in the SS and NN groups were given messages consistent with their prior identities as stutterers or normal speakers respectively.
Tudor reinforced these messages on subsequent visits to the orphanage. She had eight or nine sessions with each of the children in the NS group, and three or four sessions with the children in the SN group. The thesis doesn’t mention any sessions with the children in the SS or NN groups: either she neglected to list these sessions, or perhaps she thought that their status as controls made the sessions unnecessary.
During the sessions with the children who were being relabelled as stutterers, Tudor would pick on slight speech errors that the children made in the course of their conversation and draw attention to them, saying that they were signs of stuttering and that the child should do everything in his or her power to avoid making the errors. In addition, she attempted to recruit the orphanage’s staff to help reinforce these messages. She told them that the NS and SS children were stuttering and that they should draw the children’s attention to all their speech errors. Similarly, she told the staff that the SN and NN children were not stutterers, and she asked them to ignore these children’s speech errors or to tell them that their speech was fine.
It seems that the staff didn’t cooperate in the fashion that Tudor hoped. Although a couple of the children in the NS group mentioned to her that their teachers had commented on their speech, Tudor wrote in her thesis that the staff generally didn’t follow her instructions, or only did so to a small degree. Thus, the overall amount of indoctrination that the children received was probably much less than Tudor originally desired.
Even so, the indoctrination clearly had an effect. Here is part of Tudor’s report of an interview with one of the children in the NS group, 11-year-old Clarence Fifer, on May 2 – three-and-a-half-months into the study:
‘How is your stuttering today?’
‘I don’t know.’
‘When do you seem to have the most trouble?’
‘When I’m playin’.’
‘Tell me something about it.’
‘Well, most of the time I stutter.’
‘Do the other boys notice it?’
‘Sometimes.’
‘Do they ever say anything?’
‘No.’
‘How do you know they notice it?’
‘They kinda laughed.’
‘What did you do then?’
‘Walked away.’
‘Does it bother you much?’
 
; ‘Yes, feel pretty bad.’
‘What do you do about it?’
‘Next time try to keep myself from doin’ it.’
‘How do you do that?’
‘Sometimes I take a breath.’
‘How does it feel when you speak?’
‘Kinda strain my throat.’
His speech had a breathy quality and he took a breath after every few words whether he needed it or not.
During this interview, he had 25 speech interruptions. The stuttering phenomena added to the previous list were deep inhalation, excessive exhalation and eyes closed.
Since Tudor deceived the staff in the same way that she deceived the children, the staff could not have been in a position to give any kind of informed consent to the study. Whether there was any person at the orphanage, such as its administrator, who was informed about the true purpose of the study is not stated in Tudor’s thesis. Jim Dyer, who interviewed Tudor in 2000, when she was 84-years old, wrote that Johnson obtained permission for the study from orphanage officials, but he didn’t make clear whether Johnson actually told these officials what would be done to the children. It’s possible that Johnson felt he had carte blanche to initiate any kind of study that he considered appropriate.
So what was the result of the study? What happened to the children’s speech? In his articles in the San Jose Mercury News in 2001, Dyer reported that most or all of the children in the NS group responded to being labelled as stutterers by stuttering. In doing so, Dyer said, they confirmed Wendell Johnson’s diagnosogenic theory. In addition to stuttering, Dyer reported that many of these children became withdrawn and isolated; they were reluctant to speak at all and what few words they did speak came out in single words or brief phrases rather than complete sentences.
When Dyer tracked down some of the children – now elderly adults – for his articles, they supposedly confirmed the findings of the study. Norma Jean Pugh, who was five at the time of the study and spoke normally, apparently told Dyer that she had been induced to stutter by Tudor’s experiment, and that her stutter persisted for years, gravely damaging her social relationships and her education. Now, at age 64, she was a near-total recluse. Mary Korlaske, who also spoke normally at the start of the experiment, was also induced to stutter, she told Dyer. She later got over the stuttering, but it recurred in 1999 after the death of her husband. She moved into a retirement home, where she rarely left her room. Dyer said that she stuttered when he interviewed her, although his description of her speech did not correspond closely to what a speech pathologist would call stuttering.
There is some evidence that Johnson too believed that labelling the children as stutterers caused at least some of them to stutter. In email correspondence, Johnson’s student Oliver Bloodstein told me that ‘In his lectures in the fall of 1942, Johnson made it clear that he thought the results of the Tudor study supported the diagnosogenic theory.’ Bloodstein also wrote (in a published article): ‘To the best of my recollection, he told us that one child actually did begin to stutter as a result of the procedure.’
Although Dyer visited the University of Iowa library, where Tudor’s thesis is archived, he did not say explicitly that he read the thesis, and most of his account is based on interviews with Tudor and the surviving subjects, along with readings of Tudor’s notes. (I was not able to locate Dyer for an interview.)
A totally different account of the Johnson-Tudor study was published in 2002 by Nicoline Ambrose and Ehud Yairi, the experts on stuttering at the University of Illinois. Ambrose and Yairi actually went back and read the 60-year-old typescript that was Tudor’s thesis, and what they wrote about it in the American Journal of Speech-Language Pathology contradicted the central assertion of Dyer’s articles: Tudor’s experiment, they said, did not cause any of the children to stutter.
This conclusion was based principally on the assessments of the children’s speech that were made at the beginning and end of the study by the panel of five blinded judges. Each judge independently rated the fluency of the children’s speech on a five-point scale, with 1 corresponding to the worst fluency and 5 to the best. At the beginning of the study, the average score for the children in the crucial NS group was 2.83 – roughly in the middle of the scale of fluency, rather than near 5 as one might expect. At the end of the study the average score for these children was 2.92. Statistically, the tiny shift of the average (by 0.09 units) was completely insignificant, and what’s more, it was a shift toward improved speech – the opposite of what Johnson’s theory would have predicted. The child in this group who showed the biggest shift was Mary Korlaske, who supposedly told Dyer that she was induced to stutter by the experiment. Her speech shifted by 0.8 units – in the direction of greater fluency!
The fluency scores given by the judges included sub-scores for individual kinds of disfluency, some of which (such as repetition of syllables) were symptoms of stuttering, while others were not. Even when Ambrose and Yairi looked specifically at the scores for the stuttering-related disfluencies, there was no significant change over the course of the study.
Besides the numerical score, the judges added written comments at the end of the study. For each of the five children, including Korlaske, the majority of the judges simply wrote ‘No stuttering.’ Some of the judgments included statements like ‘appeared hesitant’ or ‘answered briefly’, but not one judge stated that any child stuttered or mentioned repetition of syllables, the key symptom of stuttering.
None of the other three groups showed any significant shift in their average speech fluency either. Even looked at individually, none of the children showed any substantial shift in the direction predicted by the theory. Thus, Ambrose and Yairi’s analysis showed that Dyer’s central claim – that the treatment caused the normally speaking children to stutter – was wrong. Nor, apparently, had the stuttering’ children been caused to stop stuttering by being labelled as normal speakers.
One might think on this basis that the Tudor study was actually a refutation of Johnson’s theory rather than a confirmation, since changing the children’s labels had no effect on their propensity to stutter. But no; it was worse than that, according to Ambrose and Yairi. They reported that the study was so poorly designed and executed that it could not have been expected to reveal anything about the theory, regardless of whether the theory was right or wrong.
Most crucially, Ambrose and Yairi reported that many of the children had been assigned to the wrong subject groups. If they had been correctly assigned, the children in the NS and NN groups should have been given scores near the 5 (fluent) end of the scale, and the children in the SN and SS groups should have been given scores near the 1 (disfluent) end of the scale. In fact, however, there were no significant differences between the average scores of any of the groups before the treatment began. There were several children who were clearly described as stuttering or repeating syllables who were put in one of the ‘N’ groups, and several children who were described as not stuttering who were put in one of the ‘S’ groups. Apparently, the ‘stuttering’ children were selected simply because the orphanage staff said that they stuttered, and the ‘normal-speaking’ children were selected simply because the staff said that they didn’t stutter, and even though these assignments were not always borne out by the judges’ assessments, the children were left in the groups they were assigned to.
Thanks to an interlibrary loan, I was finally able to lay hands on Tudor’s thesis myself, and I confirmed the truth of what Ambrose and Yairi said on these points. However, this problem is not as devastating to the credibility of the study as Ambrose and Yairi implied. For one thing, many of the items used for fluency scoring were not criteria used in the diagnosis of stuttering, and for these unrelated items there was no reason to expect that stuttering children should score differently from non-stuttering children. Also, Tudor was concerned with the effects of changing labels: thus in selecting children for the study what mattered most was how a child was labelled prior to the stu
dy, not whether he or she actually stuttered or not. In this sense, Tudor had good reason to depend on the judgments of the orphanage staff, who had been in contact with the children for years.
There were other reasons, however, why no solid conclusions could be drawn from the study. I already mentioned the fact that the children were too old to test the diagnosogenic theory if children’s susceptibility to criticism was limited to a developmental period around the age when children typically begin to stutter. Also, the children formed an unrepresentative sample in many respects, such as being institutionalised and also in most cases having below-average IQs. Furthermore, there were too few children in each group for it to be likely that significant effects of treatment would emerge.
Finally, the indoctrination of the children was done ineffectually. As already mentioned, the staff didn’t cooperate, leaving Tudor’s visits as almost the only ‘relabelling’ that the children experienced. It is hard to believe that just a few sessions with Tudor would somehow outweigh a lifetime of being exposed to the opposite labels. Tudor herself commented on this in the ‘Discussion’ section of her thesis: ‘As it was,’ she wrote, ‘the children received their stimulation almost entirely from the writer. If these children had been constantly reminded of their speech they would have undoubtedly reacted more positively [ie, by showing more signs of stuttering].’ And she predicted that ‘more extensive results’ could be expected if the experiment had been done in a ‘home situation’ with constant critiques from the children’s parents.