# Conflicting results of summary() and anova() for a mixed model with interactions in lmer+lmerTest

I recently came across what I think may be a problem in how the `anova()` function from the `lmerTest` packages computes its F-statistics and corresponding P-values for fixed effects from mixed-effects models. Let me start by saying that I know of the controversy surrounding calculating P-values from mixed effects models (for reason discussed here). Nonetheless, many folks still want P-values and thus a number of ways have been developed to accommodate this (see here). Here I want to show the results of a commonly used approach — namely, the `anova` function from the `lmerTest` package — and hope that someone has an idea of why the results are not quite making sense.

First here is my data. I had to link to it because of its size. Note that the biomass column has been standardized (mean = 0, sd = 1), hence the negative values. This does not alter the output. Once downloaded and the working directory has been specified, the file can be read in as follows:

``````dat <- read.csv("StackOverflow_Data.csv", header = T)
``````

Below is my model using the `lmer` function from `lme4`. In this model I have plant biomass as a response variable and three factors — A, B, and C — each with two levels, as predictors. Plant Genotype and spatial block are included as random effects.

``````model <- lmer(Biomass ~ A + B + C +
A:B + A:C +
B:C + A:B:C +
(1 | Genotype) + (1 | Block) ,
data = dat, REML = T)
``````

Summarizing the above model using `summary(model)` we get:

``````Linear mixed model fit by maximum likelihood t-tests use Satterthwaite approximations

to degrees of freedom [lmerMod]
Formula: Biomass ~ A + B + C + A:B + A:C + B:C + A:B:C + (1 | Genotype) +
(1 | Block)
Data: dat

AIC      BIC   logLik deviance df.resid
1059.7   1111.0   -518.8   1037.7      776
Scaled residuals:
Min       1Q   Median       3Q      Max
-3.04330 -0.63914  0.00315  0.69108  2.82368

Random effects:
Groups   Name        Variance Std.Dev.
Genotype (Intercept) 0.07509  0.2740
Block    (Intercept) 0.01037  0.1018
Residual             0.19038  0.4363
Number of obs: 787, groups:  Genotype, 50; Block, 6

Fixed effects:
Estimate Std. Error        df t value Pr(>|t|)
(Intercept)           2.27699    0.08162  47.50000  27.897  < 2e-16 ***
AYes                 -0.02308    0.09958  99.30000  -0.232  0.81719
BReduced             -0.11036    0.06232 733.00000  -1.771  0.07700 .
CSupp                -0.02152    0.06243 733.70000  -0.345  0.73039
AYes:BReduced         0.25113    0.08838 733.70000   2.841  0.00462 **
AYes:CSupp            0.02179    0.08854 734.50000   0.246  0.80567
BReduced:CSupp        0.19436    0.08838 733.10000   2.199  0.02817 *
AYes:BReduced:CSupp  -0.21746    0.12507 734.20000  -1.739  0.08251 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
(Intr) AYes   BRedcd CSupp  AYs:BR AYs:CS BRd:CS
AYes        -0.607
BReduced    -0.379  0.311
CSupp       -0.379  0.311  0.498
AYes:BRedcd  0.269 -0.444 -0.706 -0.354
AYes:CSupp   0.268 -0.444 -0.352 -0.708  0.503
BRedcd:CSpp  0.267 -0.219 -0.706 -0.705  0.498  0.500
AYs:BRdc:CS -0.190  0.315  0.500  0.502 -0.709 -0.709 -0.708
``````

The summary above uses the `lmerTest` package to compute P-values from the t-statistic using Satterthwaites’s approximation to the denominator degrees of freedom. From this we see that both the `A:B`and `B:C` interaction are significant at the p = 0.05 level. In theory, these results should be consistent, at the very least qualitatively, with those produced from the `anova()` function in the `lmerTest` package, which computes P-values in the same way. However this isn’t the case; Here is the output from `anova(model, type = 3)`. Notice the type 3 test for SS

``````Analysis of Variance Table of type III  with  Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF  DenDF F.value  Pr(>F)
A     0.09492 0.09492     1  49.87  0.4986 0.48342
B     0.66040 0.66040     1 732.66  3.4688 0.06294 .
C     0.20207 0.20207     1 733.90  1.0614 0.30324
A:B   0.99470 0.99470     1 732.56  5.2247 0.02255 *
A:C   0.36903 0.36903     1 733.66  1.9383 0.16427
B:C   0.35867 0.35867     1 733.20  1.8839 0.17031
A:B:C 0.57552 0.57552     1 734.23  3.0230 0.08251 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
``````

These results clearly differ. The `B:C` interaction is no longer significant and the P-value for the `A:B` interaction is quite a bit higher. Both models should be computing the P-values in similar ways and so it’s hard to imagine them being so different.

Why are they different?

### This was a part of the original question, but it can be misleading, see the answer below.

In fact, it seems that the `anova(model, type = 3)` function is actually using type 2 SS, which we can verify by running `anova(model, type = 2)`.

``````Analysis of Variance Table of type II  with  Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF  DenDF F.value  Pr(>F)
A     0.09526 0.09526     1  49.87  0.5004 0.48263
B     0.65996 0.65996     1 732.66  3.4665 0.06302 .
C     0.19639 0.19639     1 733.91  1.0315 0.31013
A:B   0.99282 0.99282     1 732.56  5.2148 0.02268 *
A:C   0.37018 0.37018     1 733.65  1.9444 0.16362
B:C   0.35523 0.35523     1 733.20  1.8659 0.17237
A:B:C 0.57552 0.57552     1 734.23  3.0230 0.08251 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
``````

The results are very similar, which should not be the case given the presence of interactions in the model. To show that `lmerTest::anova()` is in fact using type 2 SS rather than the type 3 SS it displays in its output we can use the `Anova()` function from the `car` package. `Anova(model, type = 2, test.statistic = 'F')` produces:

``````Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)

Response: Biomass
F Df Df.res  Pr(>F)
A     0.4857  1  48.28 0.48917
B     3.4537  1 726.63 0.06351 .
C     1.0337  1 727.77 0.30962
A:B   5.1456  1 726.54 0.02360 *
A:C   1.9302  1 727.55 0.16517
B:C   1.8776  1 727.12 0.17103
A:B:C 2.9915  1 728.06 0.08413 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
``````

Note that the use of Kenward-Roger ddf does not change the results by much for my data. What’s clear is that the type 2 SS results from the `Car` packaged produced results analogous to the type 3 SS results from the `lmerTest` package. This suggests that the `lmerTest` package is in fact computing type 2 SS. I struggle trying to figure out why this would be the case unless there is a problem in the computation of P-values from the `lmerTest` package. Am I missing something?

Any suggestions or ideas are welcome. Thanks a bunch!

Edit: December 6 2016, 11:40 am

A few folks have indicated that this question is duplicated from here. However I don’t see how this is. That post aims to understand why `aov()` and `lme()` are producing different F-statistics, which it turn out relates to how the variance components are calculated from the different functions. Here I am running only a single model using `lmer` and trying to understand why `lmerTest::anova(model)` and `summary(model)` are producing different P-values, despite the fact that they should be computed in similar ways. `lmerTest::anova()` seems to be using type 2 SS rather than the reported type 3 SS, which should only matter in the presence of interactions, which the other post does not contain in any of the listed models.

In what is undoubtedly a newbie mistake, I have solved the problem I previously detailed above. The key to getting `lmerTest::anova()` and `summary()` to agree is to ensure that the global contrasts in R are changed by inputing
``````options(contrasts = c("contr.sum","contr.poly"))
If these contrasts are not set, then `summary()` will produce the default type I sums of squares (note that `car::Anova()` will also use this default). On the other hand, `lmerTest::anova()` will produce type III SS, regardless of the default contrasts, suggesting it does this internally. If the default contrasts are set as noted above, then all three function will produce output using the desired type III SS.