Suppose I have N participants, each of whom gives a response Y 20 times, 10 in one condition and 10 in another. I fit a linear mixed effects model comparing Y in each condition. Here’s a reproducible example simulating this situation using the
library(lme4) fml <- "~ condition + (condition | participant_id)" d <- expand.grid(participant_id=1:40, trial_num=1:10) d <- rbind(cbind(d, condition="control"), cbind(d, condition="experimental")) set.seed(23432) d <- cbind(d, simulate(formula(fml), newparams=list(beta=c(0, .5), theta=c(.5, 0, 0), sigma=1), family=gaussian, newdata=d)) m <- lmer(paste("sim_1 ", fml), data=d) summary(m)
myields two fixed effects (an intercept and slope for condition), and three random effects (a by-participant random intercept, a by-participant random slope for condition, and an intercept-slope correlation).
I would like to statistically compare the size of the by-participant random intercept variance across the groups defined by
condition(i.e., compute the variance component highlighted in red separately within the control and experimental conditions, then test whether the difference in the size of the components is non-zero). How would I do this (preferably in R)?
Let’s say the model is slightly more complicated: The participants each experience 10 stimuli 20 times each, 10 in one condition and 10 in another. Thus, there are two sets of crossed random effects: random effects for participant and random effects for stimulus. Here’s a reproducible example:
library(lme4) fml <- "~ condition + (condition | participant_id) + (condition | stimulus_id)" d <- expand.grid(participant_id=1:40, stimulus_id=1:10, trial_num=1:10) d <- rbind(cbind(d, condition="control"), cbind(d, condition="experimental")) set.seed(23432) d <- cbind(d, simulate(formula(fml), newparams=list(beta=c(0, .5), theta=c(.5, 0, 0, .5, 0, 0), sigma=1), family=gaussian, newdata=d)) m <- lmer(paste("sim_1 ", fml), data=d) summary(m)
I would like to statistically compare the magnitude of the random by-participant intercept variance across the groups defined by
condition. How would I do that, and is the process any different from the one in the situation described above?
To be a bit more specific about what I’m looking for, I want to know:
- Is the question, “are the conditional mean responses within each condition (i.e., random intercept values in each condition) substantially different from each other, beyond what we would expect due to sampling error” a well-defined question (i.e., is this question even theoretically answerable)? If not, why not?
- If the answer to question (1) is yes, how would I answer it? I would prefer an
Rimplementation, but I’m not tied to the
lme4package — for example, it seems as though the
OpenMxpackage has the capability to accommodate multi-group and multi-level analyses (https://openmx.ssri.psu.edu/openmx-features), and this seems like the sort of question that ought to be answerable in an SEM framework.
There’s more than one way to test this hypothesis. For example, the procedure outlined by @amoeba should work. But it seems to me that the simplest, most expedient way to test it is using a good old likelihood ratio test comparing two nested models. The only potentially tricky part of this approach is in knowing how to set up the pair of models so that dropping out a single parameter will cleanly test the desired hypothesis of unequal variances. Below I explain how to do that.
Switch to contrast (sum to zero) coding for your independent variable and then do a likelihood ratio test comparing your full model to a model that forces the correlation between random slopes and random intercepts to be 0:
# switch to numeric (not factor) contrast codes d$contrast <- 2*(d$condition == 'experimental') - 1 # reduced model without correlation parameter mod1 <- lmer(sim_1 ~ contrast + (contrast || participant_id), data=d) # full model with correlation parameter mod2 <- lmer(sim_1 ~ contrast + (contrast | participant_id), data=d) # likelihood ratio test anova(mod1, mod2)
Visual explanation / intuition
In order for this answer to make sense, you need to have an intuitive understanding of what different values of the correlation parameter imply for the observed data. Consider the (randomly varying) subject-specific regression lines. Basically, the correlation parameter controls whether the participant regression lines “fan out to the right” (positive correlation) or “fan out to the left” (negative correlation) relative to the point X=0, where X is your contrast-coded independent variable. Either of these imply unequal variance in participants’ conditional mean responses. This is illustrated below:
In this plot, we ignore the multiple observations that we have for each subject in each condition and instead just plot each subject’s two random means, with a line connecting them, representing that subject’s random slope. (This is made up data from 10 hypothetical subjects, not the data posted in the OP.)
In the column on the left, where there’s a strong negative slope-intercept correlation, the regression lines fan out to the left relative to the point X=0. As you can see clearly in the figure, this leads to a greater variance in the subjects’ random means in condition X=−1 than in condition X=1.
The column on the right shows the reverse, mirror image of this pattern. In this case there is greater variance in the subjects’ random means in condition X=1 than in condition X=−1.
The column in the middle shows what happens when the random slopes and random intercepts are uncorrelated. This means that the regression lines fan out to the left exactly as much as they fan out to the right, relative to the point X=0. This implies that the variances of the subjects’ means in the two conditions are equal.
It’s crucial here that we’ve used a sum-to-zero contrast coding scheme, not dummy codes (that is, not setting the groups at X=0 vs. X=1). It is only under the contrast coding scheme that we have this relationship wherein the variances are equal if and only if the slope-intercept correlation is 0. The figure below tries to build that intuition:
What this figure shows is the same exact dataset in both columns, but with the independent variable coded two different ways. In the column on the left we use contrast codes — this is exactly the situation from the first figure. In the column on the right we use dummy codes. This alters the meaning of the intercepts — now the intercepts represent the subjects’ predicted responses in the control group. The bottom panel shows the consequence of this change, namely, that the slope-intercept correlation is no longer anywhere close to 0, even though the data are the same in a deep sense and the conditional variances are equal in both cases. If this still doesn’t seem to make much sense, studying this previous answer of mine where I talk more about this phenomenon may help.
Let yijk be the jth response of the ith subject under condition k. (We have only two conditions here, so k is just either 1 or 2.) Then the mixed model can be written
where αi are the subjects’ random intercepts and have variance σ2α, βi are the subjects’ random slope and have variance σ2β, eijk is the observation-level error term, and cov(αi,βi)=σαβ.
We wish to show that
Beginning with the left hand side of this implication, we have
Sum-to-zero contrast codes imply that x1+x2=0 and x21=x22=x2. Then we can further reduce the last line of the above to
which is what we wanted to prove. (To establish the other direction of the implication, we can just follow these same steps in reverse.)
To reiterate, this shows that if the independent variable is contrast (sum to zero) coded, then the variances of the subjects’ random means in each condition are equal if and only if the correlation between random slopes and random intercepts is 0. The key take-away point from all this is that testing the null hypothesis that σαβ=0 will test the null hypothesis of equal variances described by the OP.
This does NOT work if the independent variable is, say, dummy coded. Specifically, if we plug the values x1=0 and x2=1 into the equations above, we find that