How to Design and Report Experiments

Home > Other > How to Design and Report Experiments > Page 31
How to Design and Report Experiments Page 31

by Andy Field


  With other types of independent variable, the categories are fixed, and your only choice is which of the existing categories to use. ‘Gender’ is a good example of this kind of independent variable: on this planet, you are stuck with male versus female and people come ready-assigned to one level of the independent variable or the other.

  So, you have to decide how many independent variables are present in the study. Once you have done this, you need to determine how many levels (or categories) of each independent variable there are (see page 37). In the examples above, we have three levels of memory test delay and two levels of gender.

  Question 3: What Kind of Design Will I Use?

  In psychology, there are usually a variety of ways to tackle the same issue. Sometimes we might use an experimental method: here, we have different conditions in our study, corresponding to different levels of our independent variable(s), and we look for a difference between them in terms of performance on some dependent variable that we have chosen. Alternatively, we might use a correlational method: in this case, we look for a relationship between a person’s scores on one variable and their scores on another variable. Different statistical tests will be appropriate in each case, so you need to decide which kind of design you have.

  Suppose we were interested in age-changes in driving ability. One way to tackle this would be to use an experimental design, in which we looked for differences between different age-groups of drivers. Another way to address the same issue would be to use a correlational technique, and look for a relationship between age and some measure of driving ability. Most studies have either an experimental or a correlational design: you are looking either for differences between groups or conditions (for example differences in driving ability between young, middle-aged and elderly drivers), or you are looking for relationships between variables (a correlation between age and driving ability).

  Question 4: Independent Measures or Repeated Measures?

  If each participant participates in only one condition in your experiment, so that different groups of participants do different conditions, then your study has a wholly independent measures design. If each participant takes part in all of the conditions in your study, so that they give a score for each and every one of those conditions, you have a repeated-measures design. If participants participate in some, but not all, conditions, you have a ‘mixed’ design (so called because it’s a mixture of the other two kinds).

  Note that this question only applies if your design is an experimental one: it doesn’t have any relevance to correlational studies or studies involving frequency data (and hence analysed by using Chi-Squared).

  Question 5: Are My Data Parametric or Non-Parametric?

  As we saw in Chapters 5 and 6, one distinction which you will encounter frequently in statistics is between parametric and non-parametric tests. Hopefully, you recall that parametric tests assume that your data have certain characteristics: specifically, they assume that your data are

  normally-distributed;

  measurements of a continuous variable, on an interval or ratio scale of measurement;

  the amount of variation amongst the scores in each condition or group is roughly comparable (the conditions or groups have equal variances).

  Non-parametric tests make no such assumptions, but are generally less powerful than their parametric equivalents (see page 234). For the purpose of this book we call data that meet these assumptions parametric data, and data that do not non-parametric data.

  8.3 Specific Sources of Confusion in Deciding Which Test to Use

  * * *

  There are a couple of problems which seem to crop up frequently when students first start getting to grips with deciding which test to use.

  Should I use Chi-Squared or a Correlation?

  Sometimes students get confused about whether they should be using the Chi-Squared Test of Association, or a correlation. Both of these tests look for a relationship between two different variables, but they do so in quite different ways. Suppose we wanted to know whether there was a relationship between age and fearfulness. Which test should we use? The answer is actually quite straightforward, once you think about how each participant is contributing to your data. Does each participant provide a pair of scores, or do they just contribute to a ‘head count’? If it’s the latter, and each participant merely falls into a category (e.g. ‘old’ or ‘young’, ‘fearful’ or ‘fearless’) then the Chi-Squared Test of Association is the one to use. You have frequency data (i.e. number of people in each permutation of categories), and Chi-Squared will tell you if the frequencies in the categories of one of the variables (e.g. age) are non-randomly related to the frequencies of the categories of the other variable (e.g. fearfulness).

  If, on the other hand, each participant provides a pair of scores (in this case, a score for their age, and another score for their level of fearfulness), and you want to know if there is a relationship between the scores, then perform a correlation.

  Should I Use a Correlation or a t-Test?

  If you have pairs of scores from each of a number of participants, how do you decide whether to use a correlation test or a repeated-measures t-test? After all, you have pairs of scores in both cases! (In fact, if you use SPSS to perform a repeated-measures t-test, it will also give you the results of a correlation test on the same data, just to confuse you). The answer is to think of what the data represent, and precisely what it is that you are trying to find out. You use a correlation if you want to find out if there is a relationship between two sets of scores, and a t-test if you want to see if there is a difference between them.

  Suppose we take a group of lecturers, and measure their level of anxiety under two conditions: while they are with a small seminar group, and while they lecture to a large audience. Each lecturer is giving us two scores, so should we run a correlation or a repeated-measures t-test?

  A correlation test would tell you whether there is a systematic relationship between the level of anxiety in small-group situations and the level of anxiety in large-group situations: as one increases, does the other increase (or decrease) too? A significant positive correlation would tell us that lecturers who are highly anxious in seminars are also highly anxious in lectures (and that lecturers who are not anxious in seminars are also not anxious in lectures). In most cases, in a correlational study, neither variable is being manipulated by the experimenter: we merely record what is given. In this example, each lecturer would come along with their own two levels of anxiety, and we would record them: we wouldn’t be able to say to them ‘hello, you’re going to be extremely anxious in seminars today, but not so bothered about lectures!’

  In the case of a repeated-measures t-test, the pairs of scores per participant correspond to measurements of the same dependent variable measured under different conditions. You look for differences between the two measurements, that have been produced by your experimental manipulations. So, in this case, we might have pairs of scores from each lecturer, but these would have been produced because we have manipulated the conditions. For each lecturer, we would decide whether they were going to give a lecture or a seminar – this decision would represent our way of manipulating their anxiety level. The t-test would tell us if putting lecturers into different situations (seminars or lectures) produced changes in their anxiety levels.

  8.4 Examples of Using These Questions to Arrive at the Correct Test

  * * *

  Answering these five questions correctly should enable you to cope with the two situations described at the beginning of this chapter. You should be able to decide correctly which statistical test should be used in any particular set of circumstances. So, if you are faced with a description of a study (either in a journal article or as part of a statistics test) you should be able to work out which test is most appropriate. If you are designing a study, you can use these questions to help you decide what kind of data you should be aiming to obtain, and hence what kind of statistics test you will
be able to use to analyse those data once you’ve got them.

  Here are a few fictitious studies, with demonstrations of how answering the five questions above enables you to arrive at the most appropriate test for those circumstances.

  Example 1: The Effectiveness of ‘Flooding’ as a Treatment for Different Phobias

  A psychologist wanted to know whether the effectiveness of ‘flooding’ as a treatment of for phobias varied according to the particular phobia concerned. (The ‘flooding’ technique consists of taking the phobic and whatever it is that they are phobic about, throwing them into a room together, locking the door and waiting for the screams to subside. In other words, you confront them with their phobia in a big way!) Four groups of phobics (snake-phobics, spider-phobics, agoraphobics and claustrophobics) were given this treatment. (For simplicity’s sake, we’ll assume that each person has only one of these particular phobias). Each participant provided a rating for their perceived level of improvement, on a scale from ‘1’ (phobia much worse) through ‘4’ (no change) to ‘7’ (phobia much improved).

  Question 1: What kind of data do I have?

  The dependent variable is perceived improvement. The data consist of each phobic providing a rating of how much they feel they have improved. Ratings are generally ordinal-scale data (although you will see instances in the psychological literature of researchers trying to use tests on these data which, strictly speaking, should be reserved for interval or ratio data, at this stage you should play by the rules and be very principled about which test you pick!). We can be reasonably confident that a score of 6 represents more improvement than a score of 5, and that a score of 7 probably represents more improvement than a score of 6; however, we can’t know by how much someone scoring 7 is improved compared to someone scoring a 6 or a 5. A score of 6 might represent an enormous improvement over a score of 5, whereas a score of 7 might represent only a minor gain over a score of 6 – or vice versa. So, we’ll treat this as ordinal data.

  Question 2: How many independent variables do I have?

  There is one independent variable, type of phobia. It has four levels: snake-phobic, spider-phobic, agoraphobic or claustrophobic. This is one of those independent variables like gender – one in which the experimenter is not free to manipulate the independent variable completely, but is able to pick certain levels of it (i.e. the experimenter can’t randomly assign people to the different phobia categories because participants come ready-made with their particular type of phobia; but the experimenter can decide which levels of the independent variable to use in the study, for example choosing to examine snake-phobics rather than needle-phobics, etc.).

  Question 3: What kind of design do I have (experimental or correlational)?

  This is an experimental design. We are giving all of our phobics the same treatment (flooding) and then looking for differences in perceived improvement between the four different types of phobic.

  Question 4: Does my study use a repeated-measures or independent-measures design?

  This is a wholly independent-measures design. Each phobic can only be in one group, and gives just one score – their rating of perceived improvement.

  Question 5: Are my data parametric or non-parametric?

  Are the data that would be obtained in this study likely to satisfy the requirements for a parametric test? Consider each of the three requirements for a parametric test. First, are the data normally distributed? To some extent, we would have to see the actual data before making this decision. What we would be looking for is a roughly bell-shaped frequency distribution of scores in each group, around the mean of that group. Do the data show homogeneity of variance? Again, this is normally something you can only be certain about once you have collected the data. We would be looking for spreads of scores in each group that weren’t wildly dissimilar from each other. Are the data measurements on an interval or ratio scale? We have already discussed this, in answering Question 1. The answer is ‘no’, they are ordinal-level measurements. Failure on any one of these three questions means that the data do not win the prize of having a parametric test done on them: we have to find a suitable non-parametric test instead. That will teach them to be failures . . .

  Using the flow-chart in Box 8.2, together with our answers to the five questions, we can now decide what test we should conduct on our data.

  Starting at the top, we know that our data consist of scores; so that rules out Chi-Squared as an option. We have an experimental design, so that rules out using correlations. We have one independent variable, and we have an independent-measures design. There are three or more groups. We have now narrowed our choice of test down to either a one-way independent-measures ANOVA, or its non-parametric equivalent, the Kruskal-Wallis test. Since our data do not satisfy the requirements for a parametric test, we should use the Kruskal-Wallis test on our data.

  Example 2: The Relationship Between Optimism and Watching ‘Star Trek’

  Are people who watch ‘Star Trek’ more optimistic about the future of humanity than people who don’t? 500 people will be interviewed, and asked to provide (a) an estimate of how many ‘Star Trek’ episodes they have watched, and (b) asked to provide a numerical estimate of how long the human race will survive.

  Question 1: What kind of data will I collect?

  There are two measures here. Participants’ estimates of how many ‘Star Trek’ episodes they have watched is a measurement on a ratio scale: it is quite possible to have watched no episodes of ‘Star Trek’ at all, so that there is a true zero on the scale; and someone who has watched ten episodes has watched half as many as someone who has watched twenty, and twice as many as someone who has watched five. The numerical estimate of how long the human race will survive is a little less clear-cut, but can probably also be considered as ratio data, for the same reasons (intervals on the scale are equally spaced and there is a true zero point).

  Box 8.2: Which test do I use?

  Use this flow-chart in conjunction with the answers to the five questions in the text, to decide which statistical test is most appropriate for your data.

  Question 2: How many independent variables will I use?

  There are two independent variables here: number of ‘Star Trek’ episodes watched, and ‘level of optimism about the human race’.

  Question 3: What kind of design will I use (experimental or correlational)?

  We are not manipulating anything in this study, merely recording data on two independent variables. We are looking for a relationship between the two independent variables, so this is a correlational design.

  Question 4: Does my study use a repeated-measures or independent-measures design?

  This is a correlational design, so we don’t have to answer this question.

  Question 5: Are my data parametric or non-parametric?

  First, are the data normally-distributed? We have two independent variables and the scores on both would need to be roughly normally distributed. We would have to look at the data to be sure, but the frequency of watching ‘Star Trek’ is unlikely to be normally distributed. Fans tend to watch it avidly, whereas non-fans might only watch the occasional episode. Therefore the frequency distribution of scores is more likely to be bimodal than normal. It’s hard to make any firm predictions about the optimism scores, but they are likely to be skewed towards the upper end of the distribution of possible scores – only a few pessimists (realists?) are likely to give very short estimates.

  Secondly, are the scores measurements on an interval or ratio scale of measurement? We have already answered this question, and decided that both independent variables may be considered to be measurements on ratio scales.

  Finally, is the amount of variation amongst the scores in each condition or group roughly comparable? This doesn’t really apply in the case of correlational designs, so we’ll ignore it.

  Overall, it looks like we are safest treating these as non-parametric data, given our doubts about whether scores on the two variable
s are normally distributed. However, we should check the distribution of scores after the data have been collected, just to be sure.

  Working through the flowchart, we make the following decisions. The data are scores, so we can forget about using Chi-Squared. We have a correlational design, so we are faced with a choice between using either a parametric correlation test (Pearson’s r) or a non-parametric correlation test (Spearman’s rho, ρ). Once we have collected the data and looked at the distribution, we can make a final choice of test: if the distribution is not normal, we’ll use Spearman’s ρ, and if it is, we’ll use Pearson’s r.

  8.5 Summary

  When you design a study, at the very outset think about what type of data you will collect, and what statistical tests you will use on them – don’t defer thinking about these issues until after you have obtained the data!

 

‹ Prev