Consider an experiment with multiple human participants, each measured multiple times in two conditions. A mixed effects model can be formulated (using lme4 syntax) as:
fit = lmer( formula = measure ~ (1|participant) + condition )
Now, say I want to generate bootstrapped confidence intervals for the predictions of this model. I think I’ve come up with a simple and computationally efficient method, and I’m sure I’m not the first to think of it, but I’m having trouble finding any prior publications describing this approach. Here it is:
- Fit the model (as above), call this the “original model”
- Obtain predictions from the original model, call these the “original predictions”
- Obtain residuals from the original model associated with each response from each participant
- Resample the residuals, sampling participants with replacement
- Fit a linear mixed effects model with gaussian error to the residuals, call this the “interim model”
- Compute predictions from the interim model for each condition (these predictions will be very close to zero), call these the “interim predictions”
- Add the interim predictions to the original predictions, call the result the “resample predictions”
- Repeat steps 4 through 7 many times, generating a distribution of resample predictions for each condition from which once can compute CIs.
I’ve seen “residual bootstrapping” procedures in the context of simple regression (i.e. not a mixed model) where residuals are sampled as the unit of resampling and then added to the predictions of the original model before fitting a new model on each iteration of the bootstrap, but this seems rather different from the approach I describe where residuals are never resampled, people are, and only after the interim model is obtained do the original model predictions come into play. This last feature has a really nice side-benefit in that no matter the complexity of the original model, the interim model can always be fit as a gaussian linear mixed model, which can be substantially faster in some cases. For example, I recently had binomial data and 3 predictor variables, one of which I suspected would cause strongly non-linear effects, so I had to employ Generalized Additive Mixed Modelling using a binomial link function. Fitting the original model in this case took over an hour, whereas fitting the gaussian LMM on each iteration took mere seconds.
I really don’t want to claim priority on this if it’s already a known procedure, so I’d be very grateful if anyone can provide information on where this might have been described before. (Also, if there are any glaring problems with this approach, do let me know!)
My book Bootstrap Methods 2nd Edition has a massive bibliography up to 2007. So even if I don’t cover the subject in the book the reference might be in the bibliography. Of course a Google search with the right key words might be better. Freedman, Peters and Navidi did bootstrapping for prediction in linear regression and econometric models but I am not sure what has been done on the mixed model case. Stine’s 1985 JASA paper Bootstrap prediction intervals for regression is something you will find very interesting if you haven’t already seen it.