I am about to dive into learning R and my learning project will entail applying mixed- or random-effects regression to a dataset in order to develop a predictive equation. I share the concern of the writer in this post
How to choose nlme or lme4 R library for mixed effects models? in wondering whether NLME or LME4 is the better package to familiarize myself with. A more basic question is: what’s the difference between linear and nonlinear mixed-effects modeling?
For background, I applied M-E modeling in my MS research (in MATLAB, not R), so I’m familiar with how fixed vs. random variables are treated. But I’m uncertain whether the work I did was considered linear or nonlinear M-E. Is it simply the functional form of the equation used, or something else?
There are several distinctions between linear and nonlinear regression models, but the primary mathematical one is that linear models are linear in the parameters, whereas nonlinear models are nonlinear in the parameters. Pinheiro and Bates (2000, pp. 284-285), authors of the
nlme R package, elegantly described the more substantive considerations in model selection:
When choosing a regression model to describe how a response variable varies with covariates, one always has the option of using models, such as polynomial models, that are linear in the parameters. By increasing the order of a polynomial model, one can get increasingly accurate approximations to the true, usually nonlinear, regression function, within the observed range of the data. These empirical models are based only on the observed relationship between the response and the covariates and do not include any theoretical considerations about the underlying mechanism producing the data.
Nonlinear models, on the other hand, are often mechanistic, i.e., based on a model for the mechanism producing the response. As a consequence, the model parameters in a nonlinear model generally have a natural physical interpretation. Even when derived empirically, nonlinear models usually incorporate known, theoretical characteristics of the data, such as asymptotes and monotonicity, and in these cases, can be considered as semi-mechanistic models. A nonlinear model generally uses fewer parameters than a competitor linear model, such as a polynomial, giving a more parsimonious description of the data. Nonlinear models also provide more reliable predictions for the response variable outside the observed range of the data than, say, polynomial models would.
There are also some big differences between the nlme and lme4 packages that go beyond the linearity issue. For example, using nlme you can fit linear or nonlinear models and, for either type, specify the variance and correlation structures for within-group errors (e.g., autoregressive); lme4 can’t do that. In addition, random effects can be fixed or crossed in either package, but it’s much easier (and more computationally efficient) to specify and model crossed random effects in lme4.
I would advise first considering a) whether you will need a nonlinear model, and b) whether you will need to specify either the within-group variance or correlation structures. If any of these answers is yes, then you have to use nlme (given that you’re sticking with R). If you work a lot with linear models that have crossed random effects, or complicated combinations of nested and crossed random effects, then lme4 is probably a better choice. You may need to learn to use both packages. I learned lme4 first and then realized I had to use nlme because I almost always work with autoregressive error structures. However, I still prefer lme4 when I analyze data from experiments with crossed factors. The good news is that a great deal of what I learned about lme4 transferred well to nlme. Either way, Pinheiro and Bates (2000) is a great reference for mixed-effects models, and I’d say it’s indispensable if you’re using nlme.
Pinheiro, J.C., & Bates, D.M. (2000). Mixed-effects models in S and S-PLUS. New York: Springer-Verlag.