I have read many times that random effects (BLUPs/conditional modes for, say, subjects) are not parameters of a linear mixed effects model but instead can be derived from the estimated variance/covariance parameters. E.g. Reinhold Kliegl et al. (2011) state:
Random effects are subjects’ deviations from the grand mean RT and
subjects’ deviations from the fixed-effect parameters. They are
assumed to be independently and normally distributed with a mean of 0.
It is important to recognize that these random effects are not
parameters of the LMM – only their variances and covariances are.
[…] LMM parameters in combination with subjects’ data can be used to
generate “predictions” (conditional modes) of random effects for each
Can someone give an intuitive explanation how the (co)variance parameters of the random effects can be estimated without actually using/estimating the random effects?
Consider a simple linear mixed model, e.g. a random intercept model where we estimate the dependency of y on x in different subjects, and assume that each subject has their own random intercept:y=a+bx+ci+ϵ. Here intercepts ci are modeled as coming from a Gaussian distribution ci∼N(0,τ2) and random noise is also Gaussian ϵ∼N(0,σ2). In the
lme4 syntax this model would be written as
y ~ x + (1|subject).
It is instructive to rewrite the above as follows:
This is a more formal way to specify the same probabilistic model. From this formulation we can directly see that the random effects ci are not “parameters”: they are unobserved random variables. So how can we estimate the variance parameters without knowing the values of c?
Note that the first equation above describes the conditional distribution of y given c. If we know the distribution of c and of y∣c, then we can work out the unconditional distribution of y by integrating over c. You might know it as the Law of total probability. If both distributions are Gaussian, then the resulting unconditional distribution is also Gaussian.
In this case the unconditional distribution is simply N(a+bx,σ2+τ2), but our observations are not i.i.d. samples from it because there are multiple measurements per subject. In order to proceed, we need to consider the distribution of the whole n-dimensional vector y of all observations: \mathbf y \sim \mathcal N(a+b\mathbf x, \boldsymbol\Sigma) where \boldsymbol\Sigma=\sigma^2 \mathbf I_n + \tau^2 \mathbf I_N \otimes \mathbf 1_M is a block-diagonal matrix composed of \sigma^2 and \tau^2. You asked for intuition so I want to avoid the math. The important point is that this equation does not have c anymore! This is what one actually fits to the observed data, and that is why one says that c_i are not the parameters of the model.
When the parameters a, b, \tau^2, and \sigma^2 are fit, one can work out the conditional distribution of c_i for each i. What you see in the mixed model output are the modes of these distributions, aka the conditional modes.