How to Design and Report Experiments
Page 38
Inferential statistics (or ‘infernal statistics’ as we’ve seen it mis-typed!) are the results of statistical tests on the data, aimed at discovering whether there are statistically significant differences between the groups or conditions in your study (see Chapters 5, 6 and 7). For example, in the study mentioned earlier in which we measured people’s memory scores once when they were upright and once when they were upside-down, we might have performed a repeated-measures t-test to see if there was an effect of orientation on memory scores.
The results of such tests confirm and support the impressions that you obtain by ‘eyeballing’ the data. As a general rule, the results of the inferential tests determine whether or not you have an ‘effect’, and whether or not you can treat it as ‘real’ as opposed to merely a chance, freaky result. Once again, it’s a case of having to substantiate your claims, in this case with references to your test results (rather than to previously published work as is done in the Introduction and Discussion).
Sometimes students (and researchers too!) look at the descriptive data that they have collected, and are struck by an apparent difference between conditions. However when they perform an inferential test on the data, they obtain a non-significant result. If this is inconvenient or undesirable, they ignore the test result completely, and proceed to talk about the ‘difference’ between means as if the test had been significant! Ignoring the outcome of the test negates the whole point of doing it in the first place. If a statistical test shows that a ‘difference’ is not significant, you have to accept this, awkward though it might be.
Presentation of Inferential Statistics
As with descriptive statistics, how you present inferential statistics depends largely on how many of them you have. Statistics can be displayed in the text itself; as tables; or in figures (graphs or diagrams). The APA suggest the following rule of thumb: if you have three numbers or less, put them in a sentence; if you have from four to 20 numbers, use a table; and if you have more than 20 numbers, consider using a graph instead of a table. As with descriptive statistics, if you opt for putting statistical test results in a table, make sure that the reader knows what the tests are doing (i.e., which conditions are being compared with which). You will still need to explain in words, in the text of the Results section, what the tests are showing.
In general, the neatest way of presenting the results of a statistical test is to state a result (and its associated descriptive statistics) and then follow it with the relevant supporting inferential statistics enclosed in brackets. Here are some examples:
‘The mean reaction time of the group that took amphetamines (977 ms) was significantly faster than that of the group that did not (1004 ms) (t(30) = 2.66, p < .01, r = .44, one-tailed test).’
‘The number of participants saying that they liked chocolate were as follows: 38 young women; 15 elderly women; 10 young men; and 5 elderly men. It thus appears that there is a significant relationship between age and gender in terms of chocolate preference (χ2 (4, N = 68) = 10.85, p < .001).’
In both these cases, note that we show the generally-accepted abbreviation for the test (e.g. t for t-test, χ2 for Chi-Squared); the degrees of freedom or number of participants (depending on the test); the value of the test statistic (e.g., the obtained value of t, χ2 or whatever); and the probability value associated with it. (Further examples of how to report inferential statistics can be found in Chapters 4, 5, 6 and 7, in the sections entitled ‘Interpreting and Writing the Results’ which follow each of the tests covered there).
There are several ways of reporting probabilities associated with test statistics. Sometimes you’ll see them reported as an exact probability, for example ‘p = .024’ or ‘p = .036’. In other places, you’ll see them reported as the nearest landmark probability under which the obtained probability falls: thus ‘p = .024’ and ‘p = .036’ would both be written as ‘p < .05’. The American Psychological Association prefer yet another method; they now like the author to state, at the start of the results section, the level of probability that will be used for all tests in that section, rather than reporting probability values for each test separately. (The APA also like authors to report ‘effect sizes’: see Chapter 5).
Whichever method you opt for, make sure you get the ‘less than’ sign the right way round: ‘p > .05’ means ‘p greater than .05’, and hence means completely the opposite to ‘p < .05’! Failure to use the correct sign tells your tutor a lot about your statistical abilities, or rather the lack of them. Various commonly-used abbreviations and symbols are shown in Box 13.1.
You don’t need to put the calculations of statistical tests into your ‘Results’ section, just the end-results: the reader just needs to see the value of t, not the laborious computations that went into obtaining it! Neither do you need to give a reference for statistics that are in common use. Pearson’s r, t-tests, ANOVA, multiple regression and the like can all be referred to by their names, without you having to provide a source. If a statistic appears in a psychology statistics textbook, it’s safe to assume that it needs no further explanation in your report. You only need to provide supporting references if the statistic is an unusual one, or if it’s the main topic of your report.
As a rule, you only need to justify why you used a particular test if the reasons are not immediately obvious. For example: ‘Since the range of performance was markedly different for fetishists and non-fetishists, a Mann-Whitney test was used to analyse the data, rather than a t-test’. Here, the brief explanation justifies to the reader what might otherwise appear to be an odd choice of test. You can assume that your hypothetical audience, the non-specialist psychologist, doesn’t need to be reminded of the assumptions that underlie the use of a parametric test, so avoid statements like the following:
‘The data were normally distributed, showed homogeneity of variance and were measured on an interval scale, and so we decided to use a t-test.’ All perfectly correct, perhaps, but completely unnecessary in a ‘Results’ section!
Finally, a couple of minor points on presentation of numbers. Use a zero before the decimal point when the number is less than 1 (e.g. write ‘The mean error rate was 0.73.’) but not when the number cannot be greater than 1. This applies in the case of correlation coefficients (which can only range from –1 to +1) and probabilities (which cannot be larger than 1). Thus you would write ‘Pearson’s r was –.55’. While we’re on numbers, it’s worth mentioning that the APA suggests that, in the interests of clarity, it’s generally best to round to two significant digits, as in ‘p < .01’. There may be situations in which more decimal places are appropriate of course, in which case use them. However, most of the time, using more decimal places merely adds spurious precision while reducing clarity. (This applies with equal force to descriptive statistics, of course).
13.4 Make the Reader’s Task Easy
* * *
Always explain to the reader, in words, what the results show. Never just dump down descriptive or inferential statistics and expect the reader to work out for themselves what it all means. A good rule of thumb is that the text of the results section should remain comprehensible even if the tables and graphs were missing, and vice versa. Tables and graphs should be self-explanatory: the title, labels and legend should be clear, and should provide enough information for the reader to be able to figure out what is being displayed. A common mistake nowadays is for students to ‘cut and paste’ tables and graphs into their report from statistical packages such as SPSS or Excel, without replacing the arcane labels that are often supplied by these programs. Labels like ‘var000001’ or ‘condition X’ will probably mean nothing to the reader, and must be replaced by more meaningful labels.
Andy likes to end his ‘Results’ section with a summary of the key results: this is especially worthwhile if you have described a lot of complex analyses and findings, so that the reader might well have lost sight of the wood for the trees.
13.5 Be Selective in Reporting Your Resu
lts!
* * *
In Results sections, less is often more: just because you can produce the mean, median and mode on a group’s scores, it doesn’t mean that you have to present them all to the reader. Pick whichever average is the most appropriate, and show only that one (plus its associated measure of spread, namely the range or semi-interquartile range if you show the median; or the standard deviation or standard error if you use the mean).
Similar considerations apply to inferential statistics. Just because you have worked out how to get a statistics package to produce a zillion different tests on the same data, it doesn’t mean that you should show them all. Pick the most appropriate statistical test, and show the results of that. As with the Introduction, the basic idea is to show the reader enough statistical information to give them a clear idea of what you have discovered – anything extra detracts from your ‘message’ and makes it hard to see the wood for the trees. Some students seem to think that the more statistics they can squeeze into their ‘Results’ section, the better the mark they will receive. However, often it’s the opposite – all they have done is demonstrate their unintelligent and unselective use of statistics software.
13.6 Summary
This section tells the reader what you found, but leaves the interpretation of your results for the next section (the Discussion).
You may need to ‘tidy’ your data before analysing them, but this has to be done in an honest way – you shouldn’t discard participants’ data just because they didn’t fit in with what you expected to find!
The ‘Results’ section will normally consist of descriptive statistics and inferential statistics.
Descriptive statistics may be presented within the text of the ‘Results’ section, in a table, or in a graph. If you use tables or graphs, they should be labelled clearly enough for them to be understandable without reference to the text of the report.
Inferential statistics are usually (but not invariably) presented within the text of the ‘Results’ section. They should be presented in a standard way, and accompanied by a brief explanation of what they show.
Statistical calculations (whether by hand or computer) and raw data should not appear in the ‘Results’ section: if they must be included, put them in an Appendix.
Box 13.1: Some commonly used statistical symbols and abbreviations:
14 Answering the Question ‘So What?’ The Discussion Section
* * *
In the Results section, you told the reader what you found. Now it’s time to tell them what it all means, in relation to your hypotheses as outlined in the Introduction, and in relation to relevant psychological theory and previous research. As with the Introduction, the Discussion tends to fall into several sub-sections. However, again as with the Introduction, don’t use section headings: the discussion’s ‘flow’ from each of these issues to the next should be obvious enough, without any need for headings.
14.1 Summarize Your Findings
* * *
Start the Discussion by briefly summarizing the principal results of your study. Assume your reader has the attention span of a goldfish, and has either completely forgotten what you said in the ‘Results’ section or got so thoroughly confused by it that they need a brief overview of your main findings before they continue reading. Make sure you summarize the results without any statistics or excessive detail – after all, if the reader wanted all that, they would merely reread the ‘Results’ section that they have just finished!
14.2 Relate Your Findings to Previous Research
* * *
Explain how your results fit in with previous research in this area (generally, all that stuff you mentioned in the Introduction: the same considerations about relevance apply here too). How are your results explained by psychological theories on the phenomenon in question? Do they pose problems for current theoretical accounts, or are they consistent with them? Do they support one theory rather than another? Are they consistent with previous research findings in this area, do they contradict them, or do they qualify them? For most reports, this will be the longest part of the Discussion.
In most cases, the level of detail in which previous research is described should be similar to that used in the Introduction. There may be cases where you need to provide a little more specific information about a particular study however. Suppose, for example, that your study has produced findings which are at odds with those of previous research: in this case, you might want to argue that the discrepancies have arisen from procedural differences between your study and the previous ones, in which case you might need to elaborate on the details of those procedures.
Personally, I find writing this section is the most interesting part of the whole report. Sometimes your results fit in with previous findings fairly straightforwardly; sometimes they seem at odds with past findings, and you have to work out a plausible explanation for why this might be so. This is an opportunity for you to demonstrate some imagination (but not too much!) and show off your flair at being able to understand the previous research and your own findings well enough to be able to fit it all together into a more or less coherent story.
To make this section more concrete, let me take you briefly though the discussion of a paper that I was working on recently (Hole, George, Eaves and Rasek, 2002). This investigated the effects on face recognition of various distortions: for example, are faces still recognizable after they have been stretched to twice their original height? Why should anyone bother to investigate this? Well, it’s come to be appreciated that so-called ‘configural’ information (the precise spatial relationship of the eyes, nose and mouth to each other and to the rest of the face) is very important for face recognition. People can easily detect very tiny movements of facial features in pictures, especially displacements of the eyes. However, in real life, you see ‘distorted’ faces all the time, due to perspective changes, alterations in a person’s expression, etc., and yet still manage to recognize them. What we wanted to know was how well face recognition could tolerate various kinds of distortions to the configural information in a face. Stretching a face for example, greatly disrupts the relationship of horizontal distances (such as the distance between the eyes) to the vertical distances in a face (such as the distance between the tip of the nose and mouth, for example).
I won’t bore you with the details, but essentially we found that people could recognize faces virtually as easily when they had been stretched or squashed as they could if they had been left untouched, but only if the distortion was applied to the whole face (as opposed to just half of it). That was our main finding: in the Discussion of our paper, we set about trying to relate this to previous relevant work on face recognition and trying to explain the implications of our results for theories of how face recognition is carried out by the visual system.
First of all, we tried to relate our findings to those of previous studies. I have just mentioned the studies showing that we are highly sensitive to feature displacements; our findings needed to be discussed in relation to these. To our knowledge, no-one had used the particular kinds of distortions that we used (which is why we used them!). However, other manipulations have been used to investigate the limitations of the processes underlying face recognition – for example, turning a face upside down disrupts recognition considerably, so we compared the effects of our distortions to manipulations such as this. Previous work has also shown that people can recognize faces from isolated features, so we had to consider whether our participants were resorting to this strategy. For various reasons we didn’t think this was a likely explanation, so we devoted some space to arguing against this interpretation of our results.
We then went on to consider several theoretical accounts of how face recognition occurs, using our findings to try to evaluate the plausibility of each one. How well could each theory account for our results? To give just one example: one kind of theory suggests that we compare the face that we are seeing now, with our memories of all of
the faces that we know. If there’s a match, we recognize the face that’s in front of us. Hopefully you can see that a straightforward matching strategy would fail with our distorted faces, because a distorted face will fail to match any of the faces in memory. So, our results rule out this kind of explanation of how face recognition occurs.
More sophisticated versions of this theory suggest that perhaps we can apply the inverse transformation to a distorted perceptual input – so that we can ‘unstretch’ a stretched face, and then compare this unstretched version to our memory of faces that we know. We considered theories of this kind, and suggested that while we couldn’t rule them out altogether, they were implausible as an explanation of our results for two reasons. One argument was based on our results: one might expect such a transformation process to take some time, and we found no evidence of this in our reaction time data. The other argument was based on logic: how would the visual system know when to stop the transformation process and start trying to match the transformed face to stored memories of faces?
A final type of theory that we considered takes these kinds of issues into account, and suggests that we transform the distorted perceptual input until it matches a kind of ‘prototypical’ face, a ‘template’. We described a couple of theories of this type, and concluded that our results did not rule out this kind of explanation. We suggested that theorists who advocate these types of theories might want to see how their computer simulations of face recognition coped with our distorted faces.