# Is this an acceptable way to analyse mixed effect models with lme4 in R?

I have an unbalanced repeated measures data set to analyse, and I’ve read that the way most statistical packages handle this with ANOVA (i.e. type III sum of squares) is wrong. Therefore, I would like to use a mixed effects model to analyse these data. I have read a lot about mixed models in R, but I am still very new to R and mixed effect models and not very confident I am doing things right. Note that I can’t yet entirely divorce myself of “traditional” methods, and still need $p$-values and post hoc tests.

I would like to know if the following approach makes sense, or if I am doing something horribly wrong. Here’s my code:

# load packages
library(lme4)
library(languageR)
library(LMERConvenienceFunctions)
library(coda)
library(pbkrtest)

# import data

# create separate data frames for each DV & remove NAs
region.data <- na.omit(data.frame(time=my.data$time, subject=my.data$subject, dv=my.data$dv1)) # output summary of data data.summary <- summary(region.data) # fit model # "time" is a factor with three levels ("t1", "t2", "t3") region.lmer <- lmer(dv ~ time + (1|subject), data=region.data) # check model assumptions mcp.fnc(region.lmer) # remove outliers (over 2.5 standard deviations) rm.outliers <- romr.fnc(region.lmer, region.data, trim=2.5) region.data <- rm.outliers$data
region.lmer <- update(region.lmer)

# re-check model assumptions
mcp.fnc(region.lmer)

# compare model to null model
region.lmer.null <- lmer(dv ~ 1 + (1|subject), data=region.data)
region.krtest <- KRmodcomp(region.lmer, region.lmer.null)

# output lmer summary
region.lmer.summary <- summary(region.lmer)

# run post hoc tests
t1.pvals <- pvals.fnc(region.lmer, ndigits=10, withMCMC=TRUE)

region.lmer <- lmer(dv ~ relevel(time,ref="t2") + (1|subject), data=region.data)
t2.pvals <- pvals.fnc(region.lmer, ndigits=10, withMCMC=TRUE)

region.lmer <- lmer(dv ~ relevel(time,ref="t3") + (1|subject), data=region.data)
t3.pvals <- pvals.fnc(region.lmer, ndigits=10, withMCMC=TRUE)

# Get mcmc mean and 50/95% HPD confidence intervals for graphs
# repeated three times and stored in a matrix (not shown here for brevity)
as.numeric(t1.pvals$fixed$MCMCmean)
as.numeric(t1.pvals$fixed$HPD95lower)
as.numeric(t1.pvals$fixed$HPD95upper)
HPDinterval(as.mcmc(t1.pvals$mcmc),prob=0.5) HPDinterval(as.mcmc(t1.pvals$mcmc),prob=0.5)


Some specific questions I have:

1. Is this a valid way of analysing mixed effects models? If not, what
2. Are the criticism plots output by mcp.fnc good enough for verifying
model assumptions, or should I be taking additional steps.
3. I get that for mixed models to be valid, the data need respect
assumptions of normality and homoscedasticity. How to I judge what
is “approximately normal” and what is not by looking at the
criticism plots generated by mcp.fnc? Do I just need to get a feel
for this, or is their a prescribed way of doing things? How robust
are mixed models in respect to these assumptions?
4. I need to assess differences between the three time points for ~20
characteristics (biomarkers) of the subjects in my sample. Is
fitting and testing separate models for each acceptable so long as I
report all undertaken tests (significant or not), or do I need any
form of correction for multiple comparisons.

To be a little more precise in regards to the experiment, here are some more details. We followed a number of participants longitudinally as they underwent a treatment. We measured a number of biomarkers before the start of the treatment and at two time points after. What I’d like to see is if there are difference in these biomarkers between the three time points.

I am basing most of what I am doing here on this tutorial, but made some changes based on my needs and things I read. The changes I made are:

1. relevel the “time” factor to obtain t1-t2, t2-t3, and t1-t3 comparisons with pvals.fnc (from the languageR package)
2. compare my mixed model to the null model using an approximate F-test based on a Kenward-Roger’s approach (using the pbkrtest package) rather than a likelihood ratio test (because I read, that Kenward-Roger’s is better regarded right now)
3. Use the LMERConvenienceFunctions package to check assumptions and remove outliers (because I read that mixed models are very sensitive to outliers)

# Some background reading and useful packages

You should probably take a look at some of the tutorial introductions to using mixed models, I would recommend starting with Baayen et al (2008) — Baayen is the author of languageR. Barr et al (2013) discuss some issues with random effects structure, and Ben Bolker is one of the current developers of lme4. Baayen et al is especially good for your questions because they spend a lot of time discussing the use of bootstrapping / permutation tests (the stuff behind mcp.fnc).

Literature

Florian Jaeger also has a bunch of stuff on the practical side of mixed models on his lab’s blog.

There are also a number of useful R packages for visualizing and testing mixed models, like lmerTest and effects. The effects package is especially nice because it lets you plot the linear regression and confidence intervals going on behind the scenes.

# Goodness of fit and comparing models

$p$-values are a really lousy way to compare mixed models (and are generally not a great method for anything). There is a reason why Doug Bates (the original author of lme4) does not feel it is necessary to include them. If you really want to go that way (and I beg you not to), the aforementioned lmerTest provides some additional facilities for calculating and dealing with them.

Better ways of comparing models are log-likelihood or the various information-theoretic criteria like AIC and BIC. For AIC and BIC, the general rule is “smaller is better”, but you have to be careful there because they both penalize for more model degrees of freedom and there is no “absolute” good or bad value. To get a nice summary of AIC and log-likelihood models, you can use the anova() function, which has been overloaded to accept mer objects. Please note that the $\chi^2$ values given their are only valid for comparisons between nested models. Nonetheless, the output is quite nice for being so tabular, even for other comparisons. For nested models, you have a nice $\chi^2$ test and don’t need $p$-values to directly compare two models. The downside to this is that it’s not immediately clear just how good your fit is.

For looking at the individual effects (i.e. the stuff you would see in a traditional ANOVA), you should look at the $t$-values for the fixed effects in the models (which are part of the summary() if I’m not mistaken). Generally, anything $|t|>2$ is considered good (more details in Baayen et al). You can also access the fixed effects directly with the helper function fixef().

You should also make sure that none of your fixed effects are too strongly correlated — multicollinearity violates model assumptions. Florian Jaeger has written a bit on this, as well as a few possible solutions.

# Multiple comparisons

I’ll address your question #4 a bit more directly. The answer is basically the same as good practice with traditional ANOVA, unfortunately this seems to be a spot where there is a great deal of uncertainty for most researchers. (It’s the same as traditional ANOVA because both linear mixed-effects models and ANOVA are based on the general linear model, it’s just that mixed models have an extra term for the random effects.) If you’re assuming that the time windows make a difference and want to compare the effects of time, you should include time in your model. This, incidentally, will also provide a convenient way for you to judge whether time made a difference: is there a main (fixed) effect for time? The downside to going this route is that your model will get a LOT more complex and the single “super” model with time as a paramater will probably take longer to compute than three smaller models without time as a paramater. Indeed, the classic tutorial example for mixed models is sleepstudy which uses time as a paramater.

I couldn’t quite tell whether the different characteristics are meant to be predictors or dependent variables. If they’re meant to be predictors, you could throw them all into one model and see which one has the largest $t$-value, but this model would be incredibly complex, even if you don’t allow interactions, and take a long time to compute. The faster, and potentially easier way, would be to compute different models for each predictor. I would recommend using a foreach loop to parallelize it across as many cores as you have — lme4 doesn’t yet do its matrix operations in parallel, so you can compute a different model on each core in parallel. (See the short introduction here) Then, you can compare the AIC and BIC of each model to find which predictor is best. (They’re not nested in this case, so you’$\chi^2$ statistic.)

If your characteristics are the dependent variable, then you will have to compute different models anyway, and then you can use the AIC and BIC to compare the results.