How to Design and Report Experiments
Page 23
SPSS Output 6.11
This result is all very nice but as of yet we haven’t done anything about our violation of the sphericity assumption. The table in SPSS Output 6.11 shows the F-ratio and associated degrees of freedom when sphericity is assumed and the significant F-statistic indicated some difference(s) between the mean number of women eyed-up after the different doses of alcohol. This table also contains several additional rows giving the corrected values of F for the three different types of adjustment (Greenhouse-Geisser, Huynh-Feldt and lower-bound). Notice that in all cases the F-ratio remains the same; it is the degrees of freedom that change (and hence the critical value against which the obtained F-statistic is compared). The degrees of freedom are multiplied by the estimates of sphericity calculated in SPSS Output 6.12 (see Field, 1998a). The new degrees of freedom are then used to ascertain the significance of F. First we decide which correction to apply and to do this we need to look at the estimates of sphericity in SPSS Output 6.12: if the Greenhouse-Geisser and Huynh-Feldt estimates are less than .75 we should use Greenhouse-Geisser, and if they are above .75 we use Huynh-Feldt. We discovered earlier that based on these criteria we should use Huynh-Feldt here. Using this corrected value we still find a significant result because the observed p (.008) is still less than the criterion of .05. In fact, the results are significant using the Greenhouse-Geisser correction too. In these situations, where the two corrections give rise to the same conclusion it makes little difference which you choose to report; however, if you accept the F-statistic as significant the conservative Greenhouse-Geisser estimate is usually the one that is reported. Very occasionally the Greenhouse-Geisser and Huynh-Feldt estimates will give rise to different conclusions (the Huynh-Feldt produces a significant result but the Greenhouse-Geisser doesn’t). In these situations, you should still select the test based on the estimate of sphericity (as I’ve suggested); however, because the Greenhouse-Geisser correction is too strict and the Huynh-Feldt correction too liberal you can take an average of the two. In reality this means averaging the probability values for the test statistics when the two corrections are applied.
The main effect of alcohol doesn’t tell us anything about which doses of alcohol produced different results to other doses. So, we might do some post hoc tests as well (see page 173 and Field, 2000, Chapter 9). SPSS Output 6.12 shows the table from SPSS that contains these tests. We can read this in the same way as for the independent ANOVA; that is, we read down the column labelled Sig. and look for values less than .05. By looking at the significance values we can see that the only difference between condition means is between 2 and 3 pints of alcohol. Looking at the means of these groups (SPSS Output 6.19) it’s clear that the number of women that are eyed-up after 3 pints (M = 15.2) is bigger than after 2 pints (M = 11.7). No other post hoc tests are significant and so we could conclude that there is no difference in the number of women eyed-up after 1 pint compared to 2 pints, or after 3 pints compared to 4 pints. Given that the means are so similar between 1 and 2 pints, and 3 and 4 pints it’s a little weird that we don’t get an effect between I and 3 pints, or 2 and 4 pints (or indeed I and 4 pints). Looking at Figure 6.4 we might reasonably expect differences between these groups, so why haven’t we got them? Well, one possibility is that this is just an example of post hoc tests lacking the power to detect genuine effects.
SPSS Output 6.12
Calculating the Effect Size for One-Way Repeated Measures ANOVA
As with the independent ANOVA we can use two measures of variance (MSM and MSR) to calculate an effect size estimate. In SPSS Output 6.11, MSM is the mean squares for the experimental effect (labelled Alcohol), and MSR is the mean squares of the error term. These values can be read from SPSS Output 6.11.
Using the benchmarks for effect sizes this represents a medium to strong effect (it is between .3 and .5 – the thresholds for medium and large effects). Therefore, the effect of alcohol on the roving eye is substantial.
Interpreting and Writing the Result for One-Way Repeated Measures ANOVA
When we report repeated measures ANOVA, we give the same details as with an independent ANOVA. The only additional thing we should concern ourselves with is reporting the corrected degrees of freedom if sphericity was violated. Personally, I’m also keen on reporting the results of sphericity tests as well. As with the independent ANOVA the degrees of freedom used to assess the F-ratio are the degrees of freedom for the effect of the model (dfM = 2.55) and the degrees of freedom for the residuals of the model (dfR = 48.40). Remember in both cases that we’re using Huynh-Feldt corrected degrees of freedom. Therefore, we could report the main finding as:
The results show that the number of women eyed-up was significantly affected by the amount of alcohol drunk, F(2.55, 48.40) = 4.73, p < .05.
If you choose to report the sphericity test as well, you should report the Chi-Squared approximation, its degrees of freedom and the significance value. It’s also nice to report the degree of sphericity by reporting the epsilon value. We’ll also report the effect size in this improved version:
Mauchly’s test indicated that the assumption of sphericity had been violated, Χ2(5) = 13.12, p < .05, therefore degrees of freedom were corrected using Huynh-Feldt estimates of sphericity (ε = .85). The results show that the number of women eyed-up was significantly affected by the amount of alcohol drunk, F(2.55, 48.40) = 4.73, p < .05, r = .40.
The post hoc comparisons need to be reported next, and as we saw with the independent ANOVA we can either report a table of the confidence intervals, or write a general account of which tests were significant. However, because only 1 test was significant we could report this fairly succinctly along with its confidence interval:
Bonferroni post hoc tests revealed a significant difference in the number of women eyed-up only between 2 and 3 pints, CI.95 = –6.85 (lower) –.15 (upper), p < .05. No other comparisons were significant (all ps > .05).
6.8 Two-Way Independent ANOVA
* * *
We saw earlier that the name of a particular ANOVA gives away the situation in which it is used. The ‘two-way’ part of the name tells us that two independent variables will be manipulated. The second half of the name tells us how these independent variables will be measured; this is an independent ANOVA and so different participants will take part in all conditions. To sum up then, two-way independent ANOVA will be used when you intend to measure two independent variables and use different participants in all of the various groups (each person contributes only 1 score to the data).
Example: People’s musical taste tends to change as they get older (my parents, for example, after years of listening to relatively cool music when I was a kid in the 1970s, subsequently hit their mid-40s and developed a worrying obsession with country and western music – or maybe it was the stress of having me as a teenage son!). Anyway, this worries me immensely as the future seems incredibly bleak if it is spent listening to Garth Brooks and thinking ‘oh boy, did I underestimate Garth’s immense talent when I was in my 20S’5 SO, I thought I’d do some research to find out whether my fate really was sealed, or whether it’s possible to be old and like good music too. First, I got myself two groups of people (45 people in each group): one group contained young people (which I arbitrarily decided was under 40 years of age), and the other group contained more mature individuals (above 40 years of age – sorry Graham!). This is my first independent variable, age, and it has two levels (less than or more than 40 years old). I then split each of these groups of 45 into three smaller groups of 15 and assigned them to listen to either Fugazi (who everyone knows are the coolest band on the planet)6, ABBA, or Barf Grooks (who is a lesser known country and western musician not to be confused with anyone who has a similar name and produces music that makes you want to barf). This is my second independent variable, music, and has three levels (Fugazi, ABBA or Barf Grooks). There were different participants in all conditions, which means that of the 45 under-40s, 15 listened to Fugazi, 15 listened to
ABBA and 15 listened to Barf Grooks; likewise of the 45 over-40s, 15 listened to Fugazi, 15 listened to ABBA and 15 listened to Barf Grooks. After listening to the music I got each person to rate it on a scale ranging from –100 (I hate this foul music of Satan) through 0 (I am completely indifferent) to +100 (I love this music so much I’m going to explode).
SPSS Output for Two-Way Independent ANOVA
Figure 6.5 shows an error bar chart of the music data – as with the previous graphs I produced this using SPSS’s interactive graphs. The bars show the mean rating of the music played to each group, and you should now realize that the funny ‘I’ shapes show the range between which 95 % of sample means would fall (see pages 135 and 176). It’s clear from this chart that when people listened to Fugazi the two age groups were divided: the older ages rated it very low, but the younger people rated it very highly. A reverse trend is found if you look at the ratings for Barf Grooks: the youngsters give it low ratings while the wrinkly-ones love it. For ABBA the groups agreed: both old and young rated them highly.
Figure 6.5 Error bar chart of the mean ratings of different types of music for two different age groups
SPSS Output 6.13 shows the table of descriptive statistics from the two-way ANOVA; the table splits the means for the different types of music and within these groups it separates the older group from the younger. As ever, we’re told the means and standard deviations for each experimental condition. The means should correspond to those plotted in Figure 6.5.
SPSS Output 6.13
As with one-way independent ANOVA, we have to check the homogeneity of variance assumption (see page 159). The next part of the output shows Levene’s test, which we’ve encountered before (see page 165). For these data the significance value is .322, which is greater than the criterion of .05. This means that the variances in the different experimental groups are roughly equal (i.e. not significantly different), and that the assumption has been met. As such, we can continue safe in the knowledge that the final test statistics will be accurate.
SPSS Output 6.14
SPSS Output 6.15 shows the main ANOVA summary table, and the first thing to note is that it’s a fair bit more complex than for the one-way independent ANOVA. That’s because we now have an effect for each independent variable (these are known as main effects) and also the combined effect of the independent variables (this is known as the interaction between the variables). We should look at these effects in turn.
The main effect of music is shown by the F-ratio in the row labelled Music, and as in previous examples, we’re interested in the significance value of the observed F. The experimental effect, or SSM, is 81864.07 and this compares with the unsystematic variation in the data, or SSR, which is only 32553.47. These values are converted into average effects, or mean squares, by dividing by the degrees of freedom, which are 2 for the effect of music, and 84 for the unexplained variation. The effect of music is simply the mean square for music divided by the mean square error (which in this case is 40932.03/387.54 = 105.62). Finally, SPSS tells us the probability of getting a F-ratio of this magnitude by chance alone; in this case the probability is .000, which is lower than the usual cut-off point of .05. Hence, we can say that there was a significant effect of the type of music on the ratings. To understand what this actually means, we need to look at the mean ratings for each type of music when we ignore whether the person giving the rating was old or young. In fact, these overall means can be found in SPSS Output 6.13 and I’ve plotted them in Figure 6.6. The first thing to note about this graph is that the variation in ratings is huge (as shown by the error bars) in the Fugazi and Barf Grooks groups. This is because the two age groups gave such disparate ratings. However, what this graph shows is that the significant main effect of music is likely to reflect the fact that ABBA were rated (overall) much more positively than the other two artists.
SPSS Output 6.15
Figure 6.6 Mean ratings of different types of music when you ignore whether the rating came from an old or young person
The main effect of age is shown by the F-ratio in the row in SPSS Output 6.15 labelled Age, and the variance explained by this variable, SSM, is .711, which is an incredibly small amount (especially when you consider that there are 32553.47 units of unsystematic variation, SSR). This effect has 1 degree of freedom and so the resulting mean square is the same as the sum of squares (.711). The effect of age is the mean square for age divided by the mean square error (which in this case is .711/387.54 = .002). The fact that this value is less than 1 automatically tells us that there was more unexplained variance than there was variance that could be explained by age. In other words, age accounted for less variance than the error. In these cases we need only report that the F-ratio was less than 1 and everyone will know that it was nonsignificant (F cannot be significant if the independent variable explains less variance than the error). In fact, the probability associated with this F-ratio is .966, which is so close to 1 that it means that it is a virtual certainty that this F could occur by chance alone. Again, to interpret the effect we need to look at the mean ratings for the two age groups ignoring the type of music to which they listened (i.e. calculate the mean score of the 45 people over 40 and the mean score of the 45 people under 40). I’ve plotted these two means in Figure 6.7. Again, the variation in ratings is huge (as shown by the error bars) in both groups because the ratings within each age group varied a lot across the different types of music. This graph shows that when you ignore the type of music that was being rated, older people, on average, gave almost identical ratings to younger people (i.e. the mean ratings in the two groups are virtually the same).
Figure 6.7 Mean ratings of old and young people when you ignore the type of music they were rating
I hope that it has become apparent that main effects are not always that interesting. For example, here we’ve already seen that on the face of it ratings of Fugazi are similar to ratings of Barf Grooks, and yet we know that this isn’t true – it depends on which age group you ask. Similarly we’ve seen that ratings from older people are the same magnitude as ratings of younger people, yet we know that actually the ratings depend on which type of music is being rated. This is where interactions come into play. An interaction is the combined effect of two of more variables, and sometimes, they tell us the most interesting things about our data. The interaction effect is shown by the F-ratio in the row in SPSS Output 6.15 labelled Music * Age, and it explains 310790.16 units of variation, This is a large amount compared to the 32553.47 units of unsystematic variation, SSR. This effect has 2 degree of freedom and so the resulting mean square is half of the sum of squares (155395.08). The F-ratio for the interaction effect is the mean square for the interaction divided by the mean square error (155395.08/387.54 = 400.98). This tells us that the interaction explains 400 times more variance than we can’t explain (that’s a lot!). The associated significance value is understandably small (.000) and is less than the criterion of .05. Therefore, we can say that there is a significant interaction between age and the type of music rated. To interpret this effect we need to look at the mean ratings in all conditions and these means were originally plotted in Figure 6.5. If there was no interaction, then we’d expect the old and young people to agree on their ratings for different types of music. So, old and young would give the same ratings of Fugazi, the same ratings for ABBA and the same ratings for Barf Grooks. The ratings might be different for each of these artists, but within given artists the two age groups would agree (see Box 6.1). The fact there is a significant interaction tells us that for certain types of music the different age groups gave different ratings. In this case, although they agree on ABBA, there are large disagreements in ratings of Fugazi and Barf Grooks.
Box 6.1: Interpreting Interaction Graphs (Bar Charts)
The first step to understanding interactions is to know how to interpret interaction graphs. In the data for how different age groups rated different types of music we found a significant interaction. Based on
the graph (Figure 6.5) we could conclude that for certain types of music, the different age groups gave different ratings. Specifically, they gave similar ratings for ABBA, but the young group gave higher ratings for Fugazi than the old, and the old group gave higher ratings of Barf Grooks than the young. The graph below shows a different scenario – do you think this would result in a significant interaction?