by Andy Field
This profile of data probably would also give rise to a significant interaction term because although the old and young now agree that Fugazi are great and give similar ratings, and agree that ABBA are great also, they still disagree on Barf Grooks. So, the age of the participant does still have an influence over how different types of music are rated. Put another way, the effect of age for certain types of music (Fugazi and ABBA) is different to the effect of age for other types (Barf Grooks). Age has no effect when Fugazi and ABBA are rated but does when Barf Grooks is rated.
Let’s try another example. Is there an interaction now? For these data there is unlikely to be a significant interaction because the effect of age is the same for all types of music (in this case there is no effect of age for any type of music). So, if we look at ratings for Fugazi, the difference between age group ratings is virtually nonexistent, the ratings for ABBA are also identical for the two age groups, and for Barf Grooks the ratings are now similar for the age groups (both groups hate him). For every level of the music variable, there is the same effect of age (in this case age has no effect for any type of music).
In Box 6.2 we’ll look at other ways to illustrate interactions.
SPSS Output 6.16
Given that we found a main effect of music, and of the interaction between music and age, we can look at some of the post hoc tests to establish where the difference lies (see page 178). SPSS Output 6.16 shows the result of Games-Howell post hoc tests. First, ratings of Fugazi are compared to ABBA, which reveals a significant difference (the value in the column labelled Sig. is less than .05), and then Barf Grooks, which reveals no difference (the significance value is greater than .05). In the next part of the table, ratings to ABBA are compared first to Fugazi (which just repeats the finding in the previous part of the table) and then to Barf Grooks, which reveals a significant difference (the significance value is below .05). The final part of the table compares Barf Grooks to Fugazi and ABBA but these results repeat findings from the previous sections of the table.
Calculating the Effect Size for Two-Way Independent ANOVA
SPSS Output 6.15 provides us with the measures of variance that we need to calculate omega-squared (see page 181). The experimental effect (MSM) depends on which effect we’re looking at, but the mean error variance (MSR) is the same for all effects: 387.54. We can calculate ω2 and subsequently ω using this value and the mean squares for the effect we’re interested in, using the equation on page 181.
For the effect of music, SPSS Output 6.15 tells us that the mean squares for the effect is 40932.03, hence:
Using the benchmarks for effect sizes this represents a huge effect (it is close to 1). Therefore, the ratings given depended heavily on the music being evaluated.
For the effect of age (above) the value of the squared effect size is negative (because the error actually explained more variance than the experimental effect!). It is impossible to compute a square root of a negative number and so we can’t compute an effect size, using this equation. In fact, if the F-ratio is less than 1 (indicating that the mean error variance is larger than the mean variance explained by the effect) we’ll always be unable to compute an un-squared effect size. If the F-ratio is exactly I then this effect size measure will give us a result of zero. We could use our more crude measure of the model sum of squares divided by the total sum of squares (see Box 5.1) and see what happens. These sums of squares can both be found in SPSS Output 6.15.
The result is very close to zero, and so zero is a good enough approximation of the size of this effect. Age has a truly unsubstantive effect on the preference ratings.
Moving onto the interaction term:
Like the main effect of music, this finding is huge (it’s very close to 1).
Writing the Result for Two-Way Independent ANOVA
As with the other ANOVAs we’ve encountered we have to report the details of the F-ratio and the degrees of freedom from which it was calculated. For the various effects in these data the F-ratios will be based on different degrees of freedom: it was derived from dividing the mean squares for the effect by the mean squares for the residual. For the effects of music and the music × age interaction, the model degrees of freedom was 2 (dfM = 2), but for the effect of age the degrees of freedom was only I (dfM = 1). For all effects, the degrees of freedom for the residuals were 84 (dfR = 84). We can, therefore, report the three effects from this analysis as follows:
The results show that the main effect of the type of music listened to significantly affected the ratings of that music, F(2, 84) = 105.62, p < .001, r = .94. Games-Howell post hoc test revealed that ABBA were rated significantly higher than both Fugazi and Barf Grooks (both ps < .01).
The main effect of age on the ratings of the music was nonsignificant, F(1, 84) < 1, r = .00.
The music × age interaction was significant, F(2, 84) = 400.98, p < .001, r = .98, indicating that different types of music were rated differently by the two age groups. Specifically, Fugazi were rated more positively by the young group (M = 66.20, SD = 19.90) than the old (M =–75.87, SD = 14.37); ABBA were rated fairly equally in the young (M = 64.13, SD = 16.99) and old groups (M = 59.93, SD = 19.98); Barf Grooks was rated less positively by the young group (M =–71.47, SD = 23.17) compared to the old (M = 74.27, SD = 22.29). These findings indicate that there is no hope for me, the minute I hit 40 I will suddenly start to love country and western music and will burn all of my Fugazi CDs (it will never happen . . . arghhhh!!!).
6.9 Two-Way Mixed ANOVA
* * *
Now imagine that rather than measuring both independent variables with different people we decided to become a bit more efficient, and try measuring one independent variable using the same participants. So, we’d still have two independent variables but now one will be measured using the same participants and one will be measured using different participants. This design is known as mixed, because it is a blend of independent measures and repeated measures. So, the ‘two-way’ part tells us that if we want to use a two-way mixed ANOVA then we have to manipulate two independent variables, and the ‘mixed’ part tells us that one of these should be manipulated using different participants but for the other variable we should use the same participants.
Example: Text messaging is very popular amongst mobile phone owners, to the point that books have been published on how to write in text speak (BTW, hope u kno wat I mean by txt spk). One concern is that children may use this form of communication so much that it will hinder their ability to learn correct written English. One concerned researcher conducted an experiment in which one group of children were encouraged to send text messages on their mobile phones over a six month period. A second group was forbidden from sending text messages for the same period. To ensure that kids in this latter group didn’t use their phones, this group were given armbands that administered painful shocks in the presence of microwaves (like those emitted from phones).7 There were 50 different participants: 25 were encouraged to send text messages, and 25 were forbidden. The outcome was a score on a grammatical test (as a percentage) that was measured both before and after the experiment. The first independent variable was, therefore, text message use (text messagers versus controls) and the second independent variable was the time at which grammatical ability was assessed (before or after the experiment).
SPSS Output for Two-Way Mixed ANOVA
Figure 6.8 shows a line chart (with error bars) of the grammar data. The dots show the mean grammar score before and after the experiment for the text message group and the controls. The means before and after are connected by a line for the two groups separately. It’s clear from this chart that in the text message group grammar scores went down dramatically over the 6 month period in which they used their mobile phone. For the controls, their grammar scores also fell but much less dramatically. The error bars on the graph represent the standard error (see page 134). Now, back on page 135 we saw that a 95% confidence interval was simply the mean ±2 standard errors
. Therefore, these error bars are showing something similar to the error bars we’ve used before: that is, they show the variability in means from different samples. If we plot the standard error (rather than 2 standard errors) then this is actually the 68% confidence interval: the interval within which 68% of sample means will fall. Whether you choose to plot a single standard error either side of the mean, or the full 95% confidence interval (2 standard errors) is up to you because they provide similar information, but make sure you tell your reader what you’ve plotted!
Figure 6.8 Line chart of the mean grammar scores (with error bars showing the standard error of the mean) before and after the experiment for text messagers and controls
SPSS Output 6.17 shows the table of descriptive statistics from the two-way mixed ANOVA; the table has means at time 1 split according to whether the people were in the text messaging group or the control group, then below we have the means for the two groups at time 2. These means correspond to those plotted in Figure 6.8.
As with all ANOVAs, there are assumptions that we have to check. We know that when we use repeated measures we have to check the assumption of sphericity (see page 183). We also know that for independent designs we need to check the homogeneity of variance assumption (see page 159). If the design is a mixed design then we have both repeated and independent measures, so we have to check both assumptions. SPSS Output 6.18 shows Mauchly’s sphericity test (see page 184) and Levene’s test (see page 176). You may recall from page 183 that the assumption of sphericity is basically that the variance of the differences between any two levels of the repeated measures variable is the same as the variance of the differences between any two other levels of that variable. In this case, we have only two levels of the repeated measure (before and after the experiment), therefore, there is only one pair of levels for which we could calculate the difference scores (and then the variance of those scores). In short, there would be no other pair of levels with which to compare the variances of the difference scores. So, the assumption of sphericity does not apply in this case. In fact, whenever you have only two levels of the repeated measures variable you don’t have to worry about the sphericity assumption – it will always be true. To prove it, look at Mauchly’s test in SPSS Output 6.18; the value is 1, the chi-square is 0, and the significance value is blank. When sphericity holds completely, Descriptive Statistics SPSS does not print a significance value; in this case this indicates that sphericity cannot be tested because there are only two levels of the repeated measure. OK, that was easy enough, but we still have to worry about homogeneity of variance. SPSS Output 6.18 shows Levene’s test, and you should notice that it produces a different test for each level of the repeated measures variable. In mixed designs, the homogeneity assumption has to hold for every level of the repeated measures variable. As before, we are looking for a significant value (in the column labelled Sig.) to tell us that the assumption has been violated. At both levels of time, Levene’s test is non-significant (p = .77 before the experiment and p = .069 after the experiment). This means the assumption has not been broken at all (but it was quite close to being a problem after the experiment).
SPSS Output 6.17
SPSS Output 6.18
SPSS Output 6.19 shows the main ANOVA summary tables, and the first thing to note is that there are two of them! With all of the ANOVAs we’ve encountered so far the effects were summarized in a single table, but in a mixed design any results involving the repeated measure are placed in one table and the main effect of any independent measures are placed in a separate table. However, like any two-way ANOVA, we still have three effects to find: two main effects (one for each independent variable) and one interaction term.
SPSS Output 6.19
The main effect of time is shown by the F-ratio in the row labelled time, and we are interested in the significance value of the observed F. To calculate this F we look at the experimental effect of time (SSM(Time) = 1528.81) and compare this to the unsystematic variation for time (SSM(Time) = 4747.60). These values are converted into average effects, or mean squares, by dividing by the degrees of freedom, which are 1 for the effect of time, and 48 for the unexplained variation. The effect of time is simply the mean square for time divided by the mean square error (1528.81/98.91 = 15.46). SPSS has also calculated that the probability of getting an F-ratio this large by chance alone is .000, which is well below the usual cut-off point of .05. We can conclude that grammar scores were significantly affected by the time at which they were measured. The exact nature of this effect is easily determined because there were only two points in time (and so this main effect is comparing only two means). The overall means can be found in SPSS Output 6.17 and I’ve plotted them in Figure 6.9. The graph shows that grammar scores were higher before the experiment than after. So, before the experimental manipulation scores were higher than after, meaning that the manipulation had the net effect of significantly reducing grammar scores. This main effect seems rather interesting until you consider that these means include both text messagers and controls. There are three possible reasons for the drop in grammar scores: (1) the text messagers got worse and are dragging down the mean after the experiment, (2) the controls somehow got worse, or (3) the whole group just got worse and it had nothing to do with whether the children text messaged or not. Until we examine the interaction, we won’t see which of these is true.
Figure 6.9 Mean grammar score before and after the experiment when you ignore whether the participant was allowed to text message or not. Error bars show the standard error of the mean (see page 134)
The main effect of group is shown by the F-ratio in the second table in SPSS Output 6.19. The variance explained by this variable, SSM(Group), is 580.81 compared to 9334.08 units of unsystematic variation, SSR(Group). Note that in a mixed ANOVA the two main effects have different error terms: there is one for the repeated measures variable and one for the independent measures variable. These sums of squares are converted to mean squares by dividing by their respective degrees of freedom (given in the table). The effect of group is the mean square for the effect divided by the mean square error (580.80/194.46 = 2.99). The probability associated with this F-ratio is .09, which is just above the critical value of .05. Therefore, we must conclude that there was no significant main effect on grammar scores of whether children text-messaged or not. Again, this effect seems interesting enough and mobile phone companies might certainly choose to cite it as evidence that text messaging does not affect your grammar ability. However, remember that this main effect ignores the time at which grammar ability is measured. It just means that if we took the average grammar score for text messagers (that’s including their score both before and after they started using their phone), and compared this to the mean of the controls (again including scores before and after) then these means would not be significantly different. I’ve plotted these two means in Figure 6.10. This graph shows that when you ignore the time at which grammar was measured, the controls have slightly better grammar than the text messagers – but not significantly so.
Figure 6.10 Mean grammar score for text messagers and controls when you ignore the time at which the grammar score was measured. Error bars show the standard error of the mean (see page 134)
As I’ve mentioned before, main effects are not always that interesting and should certainly be viewed in the context of any interaction effects. The interaction effect in this example is shown by the F-ratio in the row labelled Time*Group, and it explains 412.09 units of variation, SSM(Interaction). Note that the error term is the same as for the repeated measures main effect (SSR(Time) = 4747.60). This interaction has 1 degree of freedom and so the resulting mean square is the same as the sum of squares. As ever, the F-ratio for the interaction effect is the mean square for the interaction divided by the mean square error (412.09/98.91 = 4.17). SPSS tells us that the probability of obtaining a value this big by chance is .047, which is just less than the criterion of .05. Therefore, we can say that there is a signific
ant interaction between the time at which grammar was measured and whether or not children were allowed to text message within that time. The mean ratings in all conditions (see Figure 6.8 and SPSS Output 6.17) help us to interpret this effect. If there were no interaction, then we would expect the same change in grammar scores in those using text messages and controls (see Box 6.2). The fact there is a significant interaction tells us that the change in grammar scores was significantly different in text messagers compared to controls. Looking at Figure 6.8 we can see that although grammar scores fell in controls, the drop was much more marked in the text messagers; so, text messaging does seem to ruin your ability at grammar compared to controls.8
SPSS Output 6.20
Although we don’t need post hoc tests as such because all of our variables had only two levels, we could do some further tests on the interaction term. One thing we could do is to look at the change in grammar for the text messagers and controls separately using dependent t-tests (to compare scores before with scores afterwards). However, if we do this we should make some correction for the number of tests that we do. I mentioned before that the easiest correction to use is known as Bonferroni correction (see page 173). This correction means that rather than use the standard critical probability value of .05, we instead use this value divided by the number of tests that we’ve done. In this case we’ve done 2 tests, so rather than accept these tests as significant if there is less than a .05 probability that the test statistic could occur by chance alone, we accept them as genuine results only if they are significant at .05/2 = .025. Now, we came across the dependent t-test on page 168 and if we do this separately for the two groups we get the outputs in SPSS Output 6.20. From these outputs it’s clear that for the text messagers there is a significant drop in their grammar ability across the experiment (p = .002 which is less than the Bonferroni corrected value of .025) but there is not a significant drop in ability for the controls (p = .056, which is greater than the Bonferroni corrected value of .025). As such the interaction reflects the relatively greater decline in grammar ability in the text message group. This kind of analysis is known as simple effects analysis.