The CV QuestionI’m trying to give (a) detailed and concise mathematical representation(s) of a mixed effects model. I am using the

`lme4`

package in R. What is the correct mathematical representation for my model?

The Data, Science Question, and R CodeMy data set consists of species in different regions. I’m testing if a species’ prevalence changes in the time leading up to an extinction (extinctions aren’t necessarily permanent; it can recolonize), or following a colonization.

`lmer(prevalence ~ time + time:type + (1 + time + type:time | reg) + (1 + time + type:time | reg:spp))`

Prevalenceis the proportion of strata occupied by a species in a region-yearTimeis a continuous variable that indicates the time to either extinction or colonization; it is always positiveTypeis a categorical variable with two levels. These two levels are “-” and “+”. When type is -, it’s a colonization (default level). When type is +, it’s an extinction.Regis a categorical variable with nine levels, indicating the regionSppis a categorical variable; the number of levels varies among regions, and varies between 48 levels and 144 levels.In words: response variable is prevalence (proportion of strata occupied). Fixed effects included 1) and intercept, 2) time from event, and 3) the interaction between time to event and the type of event (colonization or extinction). Each of these 3 fixed effects varied randomly among regions. Within a region, each of the effects varied randomly among species.

I’m trying to figure out how to write the mathematical equation for the model.I think I understand what’s going on in the R code (although, I’m sure I have some knowledge gaps, and hopefully writing out the formal mathematical expression will improve my understanding).I have searched through the web and through these forums quite a bit. I found tons of useful information, to be sure (and maybe I’ll link to some of these in an edit to this question). However, I couldn’t quite find that “Rosetta Stone” of R-code translated to math (I’m more comfortable with code) that would really help me confirm I’ve got these equations right. In fact, I know there are some gaps already, but we’ll get to that.

My AttemptThe basic form of a mixed effects model, in matrix notation is (to my understanding): Y=Xβ+Zγ+ϵ

X=[1ΔtΔt+⋮⋮⋮1ΔtnΔt+,n]

β′=[β0β1β2]

Z=[1I(r1)ΔtI(r1)Δt+I(r1)…1I(r9)ΔtI(r9)Δt+I(r9)⋮⋮⋮⋱⋮⋮⋮1I(r1,n)ΔtnI(r1,n)Δt+,nI(r1,n)…1I(r9,n)ΔtI(r9,n)Δt+,nI(r9,n)]

γ′=[γ0,1γ1,1γ2,1…γ0,9γ1,9γ2,9]

ϵ∼N(0,Σ)

- X is the design matrix for the fixed effects, Δt is the time after colonization (
`time`

), and Δt+ is the time after extinction (`time:type`

)- Z is the design matrix for the random effects (level 1?), I() is the indicator function giving 1 if the sample belongs to the designated region and 0 otherwise, r is indexed to indicates one of the nine regions.
- β and γ contain parameters
- ϵ is errors; I’m not entirely sure how to explain Σ, though I realize one of these variance/covariance matrices will express covariances among slopes and intercepts, e.g.
Assuming things so far are ~correct, that means I’m good at the top level. However, explaining the species-specific variation on the parameters, which is nested within each region, stumped me even more.

But I took a crack at something that maybe makes sense …

Each of the parameters in γ is derived from a linear combination of species-specific predictors and parameters within a region. For each region , there are 3 rows of , corresponding to the 3 predictor variables. Each γ can be individually expressed as

- γp,r=Up,rbp,r+ηp,r

- where Up,r is a design matrix specific to region r and predictor p, bp,r is a 1 by S matrix of parameters for the region (richness in the region = S, e.g. 48 or 144), and ηp,r is a matrix of error terms
Specifically, for a given region, each of the γp,r would be:

γ0,r=U0,rb0,r+η0,r

γ0,r=[1I(s1)…1I(sS)]+[b0,1⋮b0,S]+η0,r

γ1,r=U1,rb1,r+η1,r

γ1,r=[ΔtI(s1)…ΔtI(sS)]+[b1,1⋮b1,S]+η1,r

γ2,r=U2,rb2,r+η2,r

γ2,r=[Δt+I(s1)…Δt+I(sS)]+[b2,1⋮b2,S]+η2,rThat would be repeated for each region. Then, η∼N(0,Ση), like ϵ. Although, perhaps instead of Σ, there is another letter, like G, that is commonly used.

Edit: other Q/A’s that were somewhat helpful

**Answer**

If I understood the code correctly, why not simply writing something like

yi=(α+ν(α)j[i]+η(α)k[i])+(β+ν(β)j[i]+η(β)k[i])Ti+(δ+ν(δ)j[i]+η(δ)k[i])(Ti∗Zi)+ϵi

with

\begin{aligned}

\Big[\nu_{j}^{(\alpha)}, \nu_j^{(\beta)}, \nu_j^{(\delta)}\Big] &\sim \text{Multi-Normal}(\mathbf 0, \boldsymbol \Sigma_\nu) \\

\Big[\eta_{j}^{(\alpha)}, \eta_j^{(\beta)}, \eta_j^{(\delta)}\Big] &\sim \text{Multi-Normal}(\mathbf 0, \boldsymbol \Sigma_\eta)\\

\epsilon_i & \sim \text{Normal}(0, \sigma_\epsilon)

\end{aligned}

or, if the first equation is too long, something like

y_{i} = \alpha_{j[i],k[i]} + \beta_{j[i],k[i]}T_{i} + \delta_{j[i],k[i]}(T_i * Z_i) + \epsilon_i

and

\begin{aligned}

\alpha_{j[i],k[i]} &= \alpha + \nu_{j}^{(\alpha)} + \eta_{k}^{(\alpha)} \\

\beta_{j[i],k[i]}&=\beta + \nu_{j}^{(\beta)} + \eta_{k}^{(\beta)}\\

\delta_{j[i],k[i]}&=\delta + \nu_{j}^{(\delta)} + \eta_{k}^{(\delta)}\\

\end{aligned}

with the same covariance structure as above? It shows the nested structure of the data as well as which coefficients vary across which levels.

**Attribution***Source : Link , Question Author : rbatt , Answer Author : baruuum*