I developed the ez package for R as a means to help folks transition from stats packages like SPSS to R. This is (hopefully) achieved by simplifying the specification of various flavours of ANOVA, and providing SPSS-like output (including effect sizes and assumption tests), among other features. The
ezANOVA()function mostly serves as a wrapper to
car::Anova(), but the current version of
ezANOVA()implements only type-II sums of squares, whereas
car::Anova()permits specification of either type-II or -III sums of squares. As I possibly should have expected, several users have requested that I provide an argument in
ezANOVA()that lets the user request type-II or type-III. I have been reticent to do so and outline my reasoning below, but I would appreciate the community’s input on my or any other reasoning that bears on the issue.
Reasons for not including a “SS_type” argument in
- The difference between type I, II, and III sum squares only crops up when data are unbalanced, in which case I’d say that more benefit is derived from ameliorating imbalance by further data collection than fiddling with the ANOVA computation.
- The difference between type II and III applies to lower-order effects that are qualified by higher-order effects, in which case I consider the lower-order effects scientifically uninteresting. (But see below for possible complication of the argument)
- For those rare circumstances when (1) and (2) don’t apply (when further data collection is impossible and the researcher has a valid scientific interest in a qualified main effect that I can’t currently imagine), one can relatively easily modify the
ezANOVA()source or employ
car::Anova()itself to achieve type III tests. In this way, I see the extra effort/understanding required to obtain type III tests as a means by which I can ensure that only those that really know what they’re doing go that route.
Now, the most recent type-III requestor pointed out that argument (2) is undermined by consideration of circumstances where extant but “non-significant” higher-order effects can bias computation of sums of squares for lower-order effects. In such cases it’s imaginable that a researcher would look to the higher-order effect, and seeing that it is “non-significant”, turn to attempting interpretation of the lower-order effects that, unbeknownst to the researcher, have been compromised. My initial reaction is that this is not a problem with sums of squares, but with p-values and the tradition of null hypothesis testing. I suspect that a more explicit measure of evidence, such as the likelihood ratio, might be more likely to yield a less ambiguous picture of the models supported consistent with the data. However, I haven’t done much thinking on the consequence of unbalanced data for the computation of likelihood ratios (which indeed involve sums of squares), so I’ll have to give this some further thought.
Just to amplify – I am the most recent requestor, I believe.
In specific comment on Mike’s points:
It’s clearly true that the I/II/III difference only applies with correlated predictors (of which unbalanced designs are the most common example, certainly in factorial ANOVA) – but this seems to me to be an argument that dismisses the analysis of the unbalanced situation (and hence any Type I/II/III debate). It may be imperfect, but that’s the way things happen (and in many contexts the costs of further data collection outweigh the statistical problem, caveats notwithstanding).
This is completely fair and represents the meat of most of the “II versus III, favouring II” arguments I’ve come across. The best summary I’ve encountered is Langsrud (2003) “ANOVA for unbalanced data: Use Type II instead of Type III sums of squares”, Statistics and Computing 13: 163-167 (I have a PDF if the original is hard to find). He argues (taking the two-factor case as the basic example) that if there’s an interaction, there’s an interaction, so consideration of main effects is usually meaningless (an obviously fair point) – and if there’s no interaction, the Type II analysis of main effects is more powerful than the Type III (undoubtedly), so you should always go with Type II. I’ve seen other arguments (e.g. Venables, Fox) that emphasize the meaning (or lack of) of considering hypotheses about main effects in the presence of interactions, and/or/equivalently suggesting that the Type III assumptions about the null hypothesis are often not sensible (e.g. Langsrud).
And I agree with this: if you have an interaction but have some question about the main effect as well, then you’re probably into do-it-yourself territory.
Clearly there are those who just want Type III because SPSS does it, or some other reference to statistical Higher Authority. I am not wholly against this view, if it comes down to a choice of a lot of people sticking with SPSS (which I have some things against, namely time, money, and licence expiry conditions) and Type III SS, or a lot of people shifting to R and Type III SS. However, this argument is clearly a lame one statistically.
However, the argument that I found rather more substantial in favour of Type III is that made independently by Myers & Well (2003, “Research Design and Statistical Analysis”, pp. 323, 626-629) and Maxwell & Delaney (2004, “Designing Experiments and Analyzing Data: A Model Comparison Perspective”, pp. 324-328, 332-335). That is as follows:
- if there’s an interaction, all methods give the same result for the interaction sum of squares
- Type II assumes that there’s no interaction for its test of main effects; type III doesn’t
- Some (e.g. Langsrud) argue that if the interaction is not significant, then you’re justified in assuming that there isn’t one, and looking at the (more powerful) Type II main effects
- But if the test of the interaction is underpowered, yet there is an interaction, the interaction may come out “non-significant” yet still lead to a violation of the assumptions of the Type II main effects test, biasing those tests to be too liberal.
- Myers & Well cite Appelbaum/Cramer as the primary proponents of the Type II approach, and go on [p323]: “… More conservative criteria for nonsignificance of the interaction could be used, such as requiring that the interaction not be significant at the .25 level, but there is insufficient understanding of the consequences of even this approach. As a general rule, Type II sums of sqaures should not be calculated unless there is strong a priori reason to assume no interaction effects, and a clearly nonsignificant interaction sum of squares.” They cite [p629] Overall, Lee & Hornick 1981 as a demonstration that interactions that do not approach significance can bias tests of main effects. Maxwell & Delaney [p334] advocate the Type II approach if the population interaction is zero, for power, and the Type III approach if it isn’t [for the interpretability of means derived from this approach]. They too advocate using Type III in the real-life situation (when you’re making inferences about the presence of the interaction from the data) because of the problem of making a type 2 [underpowered] error in the interaction test and thus accidentally violating the assumptions of the Type II SS approach; they then make similar further points to Myers & Well, and note the long debate on this issue!
So my interpretation (and I’m no expert!) is that there’s plenty of Higher Statistical Authority on both sides of the argument; that the usual arguments put forward aren’t about the usual situation that would give rise to problems (that situation being the common one of interpreting main effects with a non-significant interaction); and that there are fair reasons to be concerned about the Type II approach in that situation (and it comes down to a power versus potential over-liberalism thing).
For me, that’s enough to wish for the Type III option in ezANOVA, as well as Type II, because (for my money) it’s a superb interface to R’s ANOVA systems. R is some way from being easy to use for novices, in my view, and the “ez” package, with ezANOVA and the rather lovely effect plotting functions, goes a long way towards making R accessible to a more general research audience. Some of my thoughts-in-progress (and a nasty hack for ezANOVA) are at http://www.psychol.cam.ac.uk/statistics/R/anova.html .
Would be interested to hear everyone’s thoughts!