An R cookbook http://www.cookbook-r.com/Statistical_analysis/ANOVA/ has an example of using aov() for mixed design ANOVAs.
I’ll copy it here:
data <- read.table(header=T, con <- textConnection(' subject sex age before after 1 F old 9.5 7.1 2 M old 10.3 11.0 3 M old 7.5 5.8 4 F old 12.4 8.8 5 M old 10.2 8.6 6 M old 11.0 8.0 7 M young 9.1 3.0 8 F young 7.9 5.2 9 F old 6.6 3.4 10 M young 7.7 4.0 11 M young 9.4 5.3 12 M old 11.6 11.3 13 M young 9.9 4.6 14 F young 8.6 6.4 15 F young 14.3 13.5 16 F old 9.2 4.7 17 M young 9.8 5.1 18 F old 9.9 7.3 19 F young 13.0 9.5 20 M young 10.2 5.4 21 M young 9.0 3.7 22 F young 7.9 6.2 23 M old 10.1 10.0 24 M young 9.0 1.7 25 M young 8.6 2.9 26 M young 9.4 3.2 27 M young 9.7 4.7 28 M young 9.3 4.9 29 F young 10.7 9.8 30 M old 9.3 9.4 ')) close(con)
Then reshape it:
library(reshape2) # Make sure subject column is a factor data$subject <- factor(data$subject) # Convert it to long format data.long <- melt(data, id = c("subject","sex","age"), # keep these columns the same measure = c("before","after"), # Put these two columns into a new column variable.name="time") # Name of the new column # subject sex age time value # 1 F old before 9.5 # 2 M old before 10.3 #...
Now analyze using a mixed anova:
aov.after.age.time <- aov(value ~ age*time + Error(subject/time), data=data.long) summary(aov.after.age.time)
But when there are more than two predictor variables, the R examples show that the between subject factors are added again after the error term:
#e.g., from R cookbook #aov.bww <- aov(y ~ b1*b2*w1 + Error(subject/(w1)) + b1*b2, data=data.long) # which would translate in our case as: aov.bww <- aov(value ~ sex*age*time + Error(subject/time) + sex*age, data=data.long) summary(aov.bww)
But why is b1*b2, or in our case sex*age, specified twice? It doesn’t seem to make a difference when we remove them after the Error() term:
aov.bww2 <- aov(value ~ sex*age*time + Error(subject/time), data=data.long) summary(aov.bww2)
Can anyone explain why the examples have those extra terms? The R manual just has this example, where the between factors are not specified twice:
# fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)
I have checked the references from the R Cookbook and found other web sites also specify the terms twice in their mixed design examples. See here:
where they have the example:
aov.ex5 = aov.ex5 = aov(Recall ~ (Task*Valence*Gender*Dosage) + Error(Subject/(Task*Valence)) + (Gender*Dosage), data.example5 )
and see here
with their example:
# Two Within Factors W1 W2, Two Between Factors B1 B2 fit <- aov(y ~ (W1*W2*B1*B2) + Error(Subject/(W1*W2)) + (B1*B2), data=mydataframe)
Which is presumably where the cookbook got their info from.
No, it is not necessary to specify those terms twice. I suspect it was either a copy/paste typo, or that the author wanted to denote separately the terms that use the subject term for the denominator in the F test and the terms that use the subject/time term. As you note, when the code is run, however, the terms are absolutely unnecessary.
In this case, also notice that the
/time part of the Error call is unnecessary; the
subject:time interaction is the lowest level, which is always included in the model. So using
Error(subject/time) give the same result; the only difference is that in the output, that level of results is called “Within” for the first and is called “subject:time” for the second.