# Why does a mixed design using R’s aov() need the between subject factors specified more than once?

An R cookbook http://www.cookbook-r.com/Statistical_analysis/ANOVA/ has an example of using aov() for mixed design ANOVAs.

I’ll copy it here:

``````data <- read.table(header=T, con <- textConnection('
subject sex   age before after
1   F   old    9.5   7.1
2   M   old   10.3  11.0
3   M   old    7.5   5.8
4   F   old   12.4   8.8
5   M   old   10.2   8.6
6   M   old   11.0   8.0
7   M young    9.1   3.0
8   F young    7.9   5.2
9   F   old    6.6   3.4
10   M young    7.7   4.0
11   M young    9.4   5.3
12   M   old   11.6  11.3
13   M young    9.9   4.6
14   F young    8.6   6.4
15   F young   14.3  13.5
16   F   old    9.2   4.7
17   M young    9.8   5.1
18   F   old    9.9   7.3
19   F young   13.0   9.5
20   M young   10.2   5.4
21   M young    9.0   3.7
22   F young    7.9   6.2
23   M   old   10.1  10.0
24   M young    9.0   1.7
25   M young    8.6   2.9
26   M young    9.4   3.2
27   M young    9.7   4.7
28   M young    9.3   4.9
29   F young   10.7   9.8
30   M   old    9.3   9.4
'))
close(con)
``````

Then reshape it:

``````library(reshape2)

# Make sure subject column is a factor
data$$subject <- factor(data$$subject)

# Convert it to long format
data.long <- melt(data, id = c("subject","sex","age"), # keep these columns the same
measure = c("before","after"),       # Put these two columns into a new column
variable.name="time")                # Name of the new column

# subject sex   age   time value
#       1   F   old before   9.5
#       2   M   old before  10.3
#...
``````

Now analyze using a mixed anova:

``````aov.after.age.time <- aov(value ~ age*time + Error(subject/time), data=data.long)
summary(aov.after.age.time)
``````

But when there are more than two predictor variables, the R examples show that the between subject factors are added again after the error term:

``````#e.g., from R cookbook
#aov.bww <- aov(y ~ b1*b2*w1 + Error(subject/(w1)) + b1*b2, data=data.long)

# which would translate in our case as:
aov.bww <- aov(value ~ sex*age*time + Error(subject/time) + sex*age, data=data.long)
summary(aov.bww)
``````

But why is b1*b2, or in our case sex*age, specified twice? It doesn’t seem to make a difference when we remove them after the Error() term:

``````aov.bww2 <- aov(value ~ sex*age*time + Error(subject/time), data=data.long)
summary(aov.bww2)
``````

Can anyone explain why the examples have those extra terms? The R manual just has this example, where the between factors are not specified twice:

``````# fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)
``````

Edit:

I have checked the references from the R Cookbook and found other web sites also specify the terms twice in their mixed design examples. See here:
http://www.personality-project.org/R/r.anova.html
where they have the example:

``````aov.ex5 = aov.ex5 = aov(Recall ~ (Task*Valence*Gender*Dosage) +
Error(Subject/(Task*Valence)) + (Gender*Dosage), data.example5 )
``````

and see here
http://www.statmethods.net/stats/anova.html
with their example:

``````# Two Within Factors W1 W2, Two Between Factors B1 B2
fit <- aov(y ~ (W1*W2*B1*B2) + Error(Subject/(W1*W2)) + (B1*B2),
data=mydataframe)
``````

Which is presumably where the cookbook got their info from.

In this case, also notice that the `/time` part of the Error call is unnecessary; the `subject:time` interaction is the lowest level, which is always included in the model. So using `Error(subject)` and `Error(subject/time)` give the same result; the only difference is that in the output, that level of results is called “Within” for the first and is called “subject:time” for the second.