Not so uncommon occurrence when dealing with complex maximal mixed models (estimating all possible random effects for given data and model) is perfect (+1 or -1) or nearly perfect correlation among some random effects. For the purpose of the discussion, let’s observe the following model and model summary
Model: Y ~ X*Cond + (X*Cond|subj) # Y = logit variable # X = continuous variable # Condition = values A and B, dummy coded; the design is repeated # so all participants go through both Conditions # subject = random effects for different subjects Random effects: Groups Name Variance Std.Dev. Corr subject (Intercept) 0.85052 0.9222 X 0.08427 0.2903 -1.00 CondB 0.54367 0.7373 -0.37 0.37 X:CondB 0.14812 0.3849 0.26 -0.26 -0.56 Number of obs: 39401, groups: subject, 219 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.49686 0.06909 36.14 < 2e-16 *** X -1.03854 0.03812 -27.24 < 2e-16 *** CondB -0.19707 0.06382 -3.09 0.00202 ** X:CondB 0.22809 0.05356 4.26 2.06e-05 ***
The supposed reason behind these perfect correlations is that we have created a model which is too complex for the data that we have. The common advice that is given in these situations is (e.g., Matuschek et al., 2017; paper) to fix the overparameterized coefficients to 0, because such degenerate models tend to lower the power. If we observe a marked change in fixed effects in a reduced model, we should accept that one; if there is no change, then there is no problem in accepting the original one.
However, let’s presume that we are not interested only in fixed effects controlled for RE (random effects), but also in the RE structure. In the given case, it would be theoretically sound to assume that
Interceptand the slope
Xhave non-zero negative correlation. Several questions follow:
What to do in such situations? Should we report the perfect correlation and say that our data is not “good enough” to estimate the “real” correlation? Or should we report the 0 correlation model? Or should we maybe try to set some other correlation to 0 in hope that the “important” one will not be perfect anymore? I don’t think there are any 100% correct answers here, I would mostly like to hear your opinions.
How to write the code which fixes the correlation of 2 specific random effects to 0, without influencing the correlations between other parameters?
Singular random-effect covariance matrices
Obtaining a random effect correlation estimate of +1 or -1 means that the optimization algorithm hit “a boundary”: correlations cannot be higher than +1 or lower than -1. Even if there are no explicit convergence errors or warnings, this potentially indicates some problems with convergence because we do not expect true correlations to lie on the boundary. As you said, this usually means that there are not enough data to estimate all the parameters reliably. Matuschek et al. 2017 say that in this situation the power can be compromised.
Another way to hit a boundary is to get a variance estimate of 0: Why do I get zero variance of a random effect in my mixed model, despite some variation in the data?
Both situations can be seen as obtaining a degenerate covariance matrix of random effects (in your example output covariance matrix is $4\times 4$); a zero variance or a perfect correlation means that the covariance matrix is not full rank and [at least] one of its eigenvalues is zero. This observation immediately suggests that there are other, more complex ways to get a degenerate covariance matrix: one can have a $4\times 4$ covariance matrix without any zeros or perfect correlations but nevertheless rank-deficient (singular). Bates et al. 2015 Parsimonious Mixed Models (unpublished preprint) recommend using principal component analysis (PCA) to check if the obtained covariance matrix is singular. If it is, they suggest to treat this situation the same way as the above singular situations.
So what to do?
If there is not enough data to estimate all the parameters of a model reliably, then we should consider simplifying the model. Taking your example model,
X*Cond + (X*Cond|subj), there are various possible ways to simplify it:
Remove one of the random effects, usually the highest-order correlation:
X*Cond + (X+Cond|subj)
Get rid of all the correlation parameters:
X*Cond + (X*Cond||subj)
Update: as @Henrik notes, the
||syntax will only remove correlations if all variables to the left of it are numerical. If categorical variables (such as
Cond) are involved, one should rather use his convenient
afexpackage (or cumbersome manual workarounds). See his answer for more details.
Get rid of some of the correlations parameters by breaking the term into several, e.g.:
X*Cond + (X+Cond|subj) + (0+X:Cond|subj)
- Constrain the covariance matrix in some specific way, e.g. by setting one specific correlation (the one that hit the boundary) to zero, as you suggest. There is no built-in way in
lme4to achieve this. See @BenBolker’s answer on SO for a demonstration of how to achieve this via some smart hacking.
Contrary to what you said, I don’t think Matuschek et al. 2017 specifically recommend #4. The gist of Matuschek et al. 2017 and Bates et al. 2015 seems to be that one starts with the maximal model a la Barr et al. 2013 and then decreases the complexity until the covariance matrix is full rank. (Moreover, they would often recommend to reduce the complexity even further, in order to increase the power.) Update: In contrast, Barr et al. recommend to reduce complexity ONLY if the model did not converge; they are willing to tolerate singular covariance matrices. See @Henrik’s answer.
If one agrees with Bates/Matuschek, then I think it is fine to try out different ways of decreasing the complexity in order to find the one that does the job while doing “the least damage”. Looking at my list above, the original covariance matrix has 10 parameters; #1 has 6 parameters, #2 has 4 parameters, #3 has 7 parameters. Which model will get rid of the perfect correlations is impossible to say without fitting them.
But what if you are interested in this parameter?
The above discussion treats random effect covariance matrix as a nuisance parameter. You raise an interesting question of what to do if you are specifically interested in a correlation parameter that you have to “give up” in order to get a meaningful full-rank solution.
Note that fixing correlation parameter at zero will not necessarily yield BLUPs (
ranef) that are uncorrelated; in fact, they might not even be affected that much at all (see @Placidia’s answer for a demonstration). So one option would be to look at the correlations of BLUPs and report that.
Another, perhaps less attractive, option would be to use treat
subject as a fixed effect
Y~X*cond*subj, get the estimates for each subject and compute correlation between them. This is equivalent to running separate
Y~X*cond regressions for each subject separately and get the correlation estimates from them.
See also the section on singular models in Ben Bolker’s mixed model FAQ:
It is very common for overfitted mixed models to result in singular fits. Technically, singularity means that some of the $\theta$ (variance-covariance Cholesky decomposition) parameters corresponding to diagonal elements of the Cholesky factor are exactly zero, which is the edge of the feasible space, or equivalently that the variance-covariance matrix has some zero eigenvalues (i.e. is positive semidefinite rather than positive definite), or (almost equivalently) that some of the variances are estimated as zero or some of the correlations are estimated as +/-1.