# Why does one have to use REML (instead of ML) for choosing among nested var-covar models?

Various descriptions on model selection on random effects of Linear Mixed Models instruct to use REML. I know difference between REML and ML at some level, but I don’t understand why REML should be used because ML is biased. For example, is it wrong to conduct a LRT on a variance parameter of a normal distribution model using ML (see the code below)? I don’t understand why it is more important to be unbiased than to be ML, in model selection. I think the ultimate answer has to be “because model selection works better with REML than with ML” but I would like to know a little more than that. I did not read the derivations of LRT and AIC (I am not good enough to understand them thoroughly), but if REML is explicitly used in the derivations, just knowing that will be actually sufficient (e.g., Chi-square approximation of the test statistic in LRT does not work if a MLE is biased).

n <- 100
a <- 10
b <- 1
alpha <- 5
beta <- 1
x <- runif(n,0,10)
y <- rnorm(n,a+b*x,alpha+beta*x)

loglik1 <- function(p,x,y){
a <- p[1]
b <- p[2]
alpha <- p[3]
-sum(dnorm(y,a+b*x,alpha,log=T))
}

loglik2 <- function(p,x,y){
a <- p[1]
b <- p[2]
alpha <- p[3]
beta <- p[4]
-sum(dnorm(y,a+b*x,alpha+beta*x,log=T))
}

m1 <- optim(c(a,b,alpha),loglik1,x=x,y=y)$value m2 <- optim(c(a,b,alpha,beta),loglik2,x=x,y=y)$value
D <- 2*(m1-m2)
1-pchisq(D,df=1) # p-value

## Answer

A very short answer: the REML is a ML, so the test based on REML is correct anyway. As the estimation of the variance parameters with REML is better, it is natural to use it.

Why is REML a ML? Consider e.g. a model

with $X\in\R^{n\times p}$, $Z\in\R^{n\times q}$, and $\beta \in \R^p$ is the vector of the fixed effects, $u \sim \mathcal N(0, \tau I_q)$ is the vector of random effects, and $e \sim \mathcal N(0, \sigma^2 I_n)$. The Restricted Likelihood can be obtained by considering $n-p$ contrasts to “remove” the fixed effects. More precisely, let $C \in \R^{(n-p)\times n}$, such that $CX = 0$ and $CC' = I_{n-p}$ (that is, the columns of $C'$ are an orthonormal basis of the vector space orthognal to the space generated by the columns of $X$) ; then with $\epsilon \sim \mathcal N(0, \sigma^2 I_{n-p})$, and the likelihood for $\tau, \sigma^2$ given $CY$ is the Restricted Likelihood.

Attribution
Source : Link , Question Author : quibble , Answer Author : Elvis