I have been reading about calculating R2 values in mixed models and after reading the R-sig FAQ, other posts on this forum (I would link a few but I don’t have enough reputation) and several other references I understand that using R2 values in the context of mixed models is complicated.

However, I have recently came across these two papers below. While these methods do look promising (to me) I am not a statistician, and as such I was wondering if anyone else would have any insight about the methods they propose and how they would compare to other methods that have been proposed.

Nakagawa, Shinichi, and Holger Schielzeth. “A general and simple method for obtaining R2 from generalized linear mixed‐effects models.” Methods in Ecology and Evolution 4.2 (2013): 133-142.

Johnson, Paul CD. “Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models.” Methods in Ecology and Evolution (2014).

The is method can also be implemented using the r.squaredGLMM function in the MuMIn package which gives the following description of the method.

For mixed-effects models, R2 can be categorized into two types. Marginal R2 represents the variance explained by fixed factors, and is defined as:

R_{GLMM}(m)^2 = \frac{σ_f^2}{σ_f^2 + \sum(σ_l^2) + σ_e^2 + σ_d^2}

Conditional R^2 is interpreted as variance explained by both fixed and random factors (i.e. the entire model), and is calculated according to the equation:

R_{GLMM}(c)^2= \frac{(σ_f^2 + \sum(σ_l^2))}{(σ_f^2 + \sum(σ_l^2) + σ_e^2 + σ_d^2}

where σ_f^2 is the variance of the fixed effect components, and \sum(σ_l^2) is the sum of all variance components (group, individual, etc.), σ_l^2 is the variance due to additive dispersion and σ_d^2 is the distribution-specific variance.In my analysis I am looking at longitudinal data and I am primarily interested in variance explained by the fixed effects in the model

`library(MuMIn) library(lme4) fm1 <- lmer(zglobcog ~ age_c + gender_R2 + ibphdtdep + iyeareducc + apoegeno + age_c*apoegeno + (age_c | pathid), data = dat, REML = FALSE, control = lmerControl(optimizer = "Nelder_Mead")) # Jarret Byrnes (correlation between the fitted and the observed values) r2.corr.mer <- function(m) { lmfit <- lm(model.response(model.frame(m)) ~ fitted(m)) summary(lmfit)$r.squared } r2.corr.mer(fm1) [1] 0.8857005 # Xu 2003 1-var(residuals(fm1))/(var(model.response(model.frame(fm1)))) [1] 0.8783479 # Nakagawa & Schielzeth's (2013) r.squaredGLMM(fm1) R2m R2c 0.1778225 0.8099395`

**Answer**

I am answering by pasting Douglas Bates’s reply in the R-Sig-ME mailing list, on 17 Dec 2014 on the question of how to calculate an R^2 statistic for generalized linear mixed models, which I believe is required reading for anyone interested in such a thing. Bates is the original author of the `lme4`

package for R and co-author of `nlme`

, as well as co-author of a well-known book on mixed models, and CV will benefit from having the text in an answer, rather than just a link to it.

I must admit to getting a little twitchy when people speak of the “R2 for

GLMMs”. R2 for a linear model is well-defined and has many desirable

properties. For other models one can define different quantities that

reflect some but not all of these properties. But this is not calculating

an R2 in the sense of obtaining a number having all the properties that the

R2 for linear models does. Usually there are several different ways that

such a quantity could be defined. Especially for GLMs and GLMMs before you

can define “proportion of response variance explained” you first need to

define what you mean by “response variance”. The whole point of GLMs and

GLMMs is that a simple sum of squares of deviations does not meaningfully

reflect the variability in the response because the variance of an

individual response depends on its mean.Confusion about what constitutes R2 or degrees of freedom of any of the

other quantities associated with linear models as applied to other models

comes from confusing the formula with the concept. Although formulas are

derived from models the derivation often involves quite sophisticated

mathematics. To avoid a potentially confusing derivation and just “cut to

the chase” it is easier to present the formulas. But the formula is not

the concept. Generalizing a formula is not equivalent to generalizing the

concept. And those formulas are almost never used in practice, especially

for generalized linear models, analysis of variance and random effects. I

have a “meta-theorem” that the only quantity actually calculated according

to the formulas given in introductory texts is the sample mean.It may seem that I am being a grumpy old man about this, and perhaps I am,

but the danger is that people expect an “R2-like” quantity to have all the

properties of an R2 for linear models. It can’t. There is no way to

generalize all the properties to a much more complicated model like a GLMM.I was once on the committee reviewing a thesis proposal for Ph.D.

candidacy. The proposal was to examine I think 9 different formulas that

could be considered ways of computing an R2 for a nonlinear regression

model to decide which one was “best”. Of course, this would be done

through a simulation study with only a couple of different models and only

a few different sets of parameter values for each. My suggestion that this

was an entirely meaningless exercise was not greeted warmly.

**Attribution***Source : Link , Question Author : Andrews , Answer Author : Robert Long*