I recently came across what I think may be a problem in how the

`anova()`

function from the`lmerTest`

packages computes its F-statistics and corresponding P-values for fixed effects from mixed-effects models. Let me start by saying that I know of the controversy surrounding calculating P-values from mixed effects models (for reason discussed here). Nonetheless, many folks still want P-values and thus a number of ways have been developed to accommodate this (see here). Here I want to show the results of a commonly used approach — namely, the`anova`

function from the`lmerTest`

package — and hope that someone has an idea of why the results are not quite making sense.First here is my data. I had to link to it because of its size. Note that the biomass column has been standardized (mean = 0, sd = 1), hence the negative values. This does not alter the output. Once downloaded and the working directory has been specified, the file can be read in as follows:

`dat <- read.csv("StackOverflow_Data.csv", header = T)`

Below is my model using the

`lmer`

function from`lme4`

. In this model I have plant biomass as a response variable and three factors — A, B, and C — each with two levels, as predictors. Plant Genotype and spatial block are included as random effects.`model <- lmer(Biomass ~ A + B + C + A:B + A:C + B:C + A:B:C + (1 | Genotype) + (1 | Block) , data = dat, REML = T)`

Summarizing the above model using

`summary(model)`

we get:`Linear mixed model fit by maximum likelihood t-tests use Satterthwaite approximations to degrees of freedom [lmerMod] Formula: Biomass ~ A + B + C + A:B + A:C + B:C + A:B:C + (1 | Genotype) + (1 | Block) Data: dat AIC BIC logLik deviance df.resid 1059.7 1111.0 -518.8 1037.7 776 Scaled residuals: Min 1Q Median 3Q Max -3.04330 -0.63914 0.00315 0.69108 2.82368 Random effects: Groups Name Variance Std.Dev. Genotype (Intercept) 0.07509 0.2740 Block (Intercept) 0.01037 0.1018 Residual 0.19038 0.4363 Number of obs: 787, groups: Genotype, 50; Block, 6 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 2.27699 0.08162 47.50000 27.897 < 2e-16 *** AYes -0.02308 0.09958 99.30000 -0.232 0.81719 BReduced -0.11036 0.06232 733.00000 -1.771 0.07700 . CSupp -0.02152 0.06243 733.70000 -0.345 0.73039 AYes:BReduced 0.25113 0.08838 733.70000 2.841 0.00462 ** AYes:CSupp 0.02179 0.08854 734.50000 0.246 0.80567 BReduced:CSupp 0.19436 0.08838 733.10000 2.199 0.02817 * AYes:BReduced:CSupp -0.21746 0.12507 734.20000 -1.739 0.08251 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) AYes BRedcd CSupp AYs:BR AYs:CS BRd:CS AYes -0.607 BReduced -0.379 0.311 CSupp -0.379 0.311 0.498 AYes:BRedcd 0.269 -0.444 -0.706 -0.354 AYes:CSupp 0.268 -0.444 -0.352 -0.708 0.503 BRedcd:CSpp 0.267 -0.219 -0.706 -0.705 0.498 0.500 AYs:BRdc:CS -0.190 0.315 0.500 0.502 -0.709 -0.709 -0.708`

The summary above uses the

`lmerTest`

package to compute P-values from the t-statistic using Satterthwaites’s approximation to the denominator degrees of freedom. From this we see that both the`A:B`

and`B:C`

interaction are significant at the p = 0.05 level. In theory, these results should be consistent, at the very least qualitatively, with those produced from the`anova()`

function in the`lmerTest`

package, which computes P-values in the same way. However this isn’t the case; Here is the output from`anova(model, type = 3)`

.Notice the type 3 test for SS`Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) A 0.09492 0.09492 1 49.87 0.4986 0.48342 B 0.66040 0.66040 1 732.66 3.4688 0.06294 . C 0.20207 0.20207 1 733.90 1.0614 0.30324 A:B 0.99470 0.99470 1 732.56 5.2247 0.02255 * A:C 0.36903 0.36903 1 733.66 1.9383 0.16427 B:C 0.35867 0.35867 1 733.20 1.8839 0.17031 A:B:C 0.57552 0.57552 1 734.23 3.0230 0.08251 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

These results clearly differ. The

`B:C`

interaction is no longer significant and the P-value for the`A:B`

interaction is quite a bit higher. Both models should be computing the P-values in similar ways and so it’s hard to imagine them being so different.Why are they different?

## This was a part of the original question, but it can be misleading, see the answer below.

In fact, it seems that the

`anova(model, type = 3)`

function is actually using type 2 SS, which we can verify by running`anova(model, type = 2)`

.`Analysis of Variance Table of type II with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) A 0.09526 0.09526 1 49.87 0.5004 0.48263 B 0.65996 0.65996 1 732.66 3.4665 0.06302 . C 0.19639 0.19639 1 733.91 1.0315 0.31013 A:B 0.99282 0.99282 1 732.56 5.2148 0.02268 * A:C 0.37018 0.37018 1 733.65 1.9444 0.16362 B:C 0.35523 0.35523 1 733.20 1.8659 0.17237 A:B:C 0.57552 0.57552 1 734.23 3.0230 0.08251 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

The results are very similar, which should not be the case given the presence of interactions in the model. To show that

`lmerTest::anova()`

is in fact using type 2 SS rather than the type 3 SS it displays in its output we can use the`Anova()`

function from the`car`

package.`Anova(model, type = 2, test.statistic = 'F')`

produces:`Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df) Response: Biomass F Df Df.res Pr(>F) A 0.4857 1 48.28 0.48917 B 3.4537 1 726.63 0.06351 . C 1.0337 1 727.77 0.30962 A:B 5.1456 1 726.54 0.02360 * A:C 1.9302 1 727.55 0.16517 B:C 1.8776 1 727.12 0.17103 A:B:C 2.9915 1 728.06 0.08413 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

Note that the use of Kenward-Roger ddf does not change the results by much for my data. What’s clear is that the type 2 SS results from the

`Car`

packaged produced results analogous to the type 3 SS results from the`lmerTest`

package. This suggests that the`lmerTest`

package is in fact computing type 2 SS. I struggle trying to figure out why this would be the case unless there is a problem in the computation of P-values from the`lmerTest`

package. Am I missing something?Any suggestions or ideas are welcome. Thanks a bunch!

Edit: December 6 2016, 11:40 amA few folks have indicated that this question is duplicated from here. However I don’t see how this is. That post aims to understand why

`aov()`

and`lme()`

are producing different F-statistics, which it turn out relates to how the variance components are calculated from the different functions. Here I am running only a single model using`lmer`

and trying to understand why`lmerTest::anova(model)`

and`summary(model)`

are producing different P-values, despite the fact that they should be computed in similar ways.`lmerTest::anova()`

seems to be using type 2 SS rather than the reported type 3 SS, which should only matter in the presence of interactions, which the other post does not contain in any of the listed models.

**Answer**

In what is undoubtedly a newbie mistake, I have solved the problem I previously detailed above. The key to getting `lmerTest::anova()`

and `summary()`

to agree is to ensure that the global contrasts in R are changed by inputing

```
options(contrasts = c("contr.sum","contr.poly"))
```

If these contrasts are not set, then `summary()`

will produce the default type I sums of squares (note that `car::Anova()`

will also use this default). On the other hand, `lmerTest::anova()`

will produce type III SS, regardless of the default contrasts, suggesting it does this internally. If the default contrasts are set as noted above, then all three function will produce output using the desired type III SS.

**Attribution***Source : Link , Question Author : James S. , Answer Author : amoeba*