# Does a Bayesian interpretation exist for REML?

Is a Bayesian interpretation of REML available? To my intuition, REML bears a strong likeness to so-called empirical Bayes estimation procedures, and I wonder if some kind of asymptotic equivalence (under some suitable class of priors, say) has been demonstrated. Both empirical Bayes and REML seem like ‘compromised’ estimation approaches undertaken in the face of nuisance parameters, for example.

Mainly, what I seek by this question is the high-level insight that these sorts of arguments tend to yield. Of course, if an argument of this nature for some reason cannot usefully be pursued for REML, an explanation for why this is so would also deliver welcome insight!

Bayesian interpretations exist only within the framework of Bayesian analysis, for estimators that relate to a posterior distribution. Hence, the only way that the REML estimator could be given a Bayesian interpretation (i.e., an interpretation as an estimator taken from the posterior) is if we take the restricted log-likelihood in the REML analysis to be the log-posterior in a corresponding Bayes analysis; in this case the REML estimator would be a MAP estimator from Bayesian theory, with its corresponding Bayesian interpretation.

Setting the REML estimator to be a MAP estimator: It is relatively simple to see how to set the restricted log-likelihood in the REML analysis to be the log-posterior in a Bayes analysis. To do this, we require the log-prior to be the negative of the part of the log-likelihood that is removed by the REML process. Suppose we have log-likelihood $\ell_\mathbf{x}(\theta, \nu) = \ell_*(\theta, \nu) + \ell_{\text{RE}}(\theta)$ where $\ell_{\text{RE}}(\theta)$ is the residual log-likelihood and $\theta$ is the parameter of interest (with $\nu$ being our nuisance parameter). Setting the prior to $\pi(\theta, \nu) \propto \exp(-\ell_*(\theta, \nu))$ gives corresponding posterior:

\begin{aligned} \pi(\theta|\mathbf{x}) &\propto \int L_\mathbf{x}(\theta, \nu) \pi(\theta, \nu) d\nu \\[6pt] &\propto \int \exp(\ell_\mathbf{x}(\theta, \nu)) \exp(-\ell_*(\theta, \nu)) d\nu \\[6pt] &= \int \exp(\ell_\mathbf{x}(\theta, \nu) – \ell_*(\theta, \nu)) d\nu \\[6pt] &= \int \exp(\ell_*(\theta, \nu) + \ell_{\text{RE}}(\theta) – \ell_*(\theta, \nu)) d\nu \\[6pt] &= \int \exp(\ell_{\text{RE}}(\theta)) d\nu \\[6pt] &= \int L_{\text{RE}}(\theta) d\nu \\[6pt] &\propto L_{\text{RE}}(\theta). \\[6pt] \end{aligned}

This gives us:

$$\hat{\theta}_\text{MAP} = \arg \max_\theta \pi(\theta|\mathbf{x}) = \arg \max_\theta L_{\text{RE}}(\theta) = \hat{\theta}_\text{REML}.$$

This result allows us to interpret the REML estimator as a MAP estimator, so the proper Bayesian interpretation of the REML estimator is that it is the estimator that maximises the posterior density under the above prior.

Having illustrated the method to give a Bayesian interpretation to the REML estimator, we now note that there are some big problems with this approach. One problem is that the prior is formed using the log-likelihood component $\ell_*(\theta, \nu)$, which depends on the data. Hence, the “prior” necessary to obtain this interpretation is not a real prior, in the sense of being a function that can be formed prior to seeing the data. Another problem is that the prior will often be improper (i.e., it does not integrate to one) and it may actually increase in weight as the parameter values become extreme. (We will show an example of this below.)

Based on these problems, one could argue that there is no reasonable Bayesian interpretation for the REML estimator. Alternatively, one could argue that the REML estimator still maintains the above Bayesian interpretation, being a maximum a posteriori estimator under a “prior” that must coincidentally align with the observed data in the specified form, and may be extremely improper.

Illustration with normal data: The classic example of REML estimation is for the case of normal data $x_1,…,x_n \sim \text{N}(\nu, 1/\theta)$ where you are interested in the precision $\theta$ and the mean $\nu$ is a nuisance parameter. In this case you have the log-likelihood function:

$$\ell_\mathbf{x}(\nu,\theta) = – \frac{n}{2} \ln \theta – \frac{\theta}{2} \sum_{i=1}^n (x_i-\nu)^2.$$

In REML we split this log-likelihood into the two components:

\begin{aligned} \ell_*(\nu,\theta) &= – \frac{n}{2} \ln \theta – \frac{\theta}{2} \sum_{i=1}^n (x_i-\nu)^2 \\[10pt] \ell_\text{RE}(\theta) &= – \frac{n-1}{2} \ln \theta – \frac{\theta}{2} \sum_{i=1}^n (x_i-\bar{x})^2. \end{aligned}

We obtain the REML estimator for the precision parameter by maximising the residual likelihood, which gives an unbiased estimator for the variance:

$$\frac{1}{\hat{\theta}_\text{REML}} = \frac{1}{n-1} \sum_{i=1}^n (x_i-\bar{x})^2.$$

In this case, the REML estimator will correspond to a MAP estimator for the “prior” density:

$$\pi(\theta) \propto \theta^{n/2} \exp \Bigg( \frac{\theta}{2} \sum_{i=1}^n (x_i-\nu)^2 \Bigg).$$

As you can see, this “prior” actually depends on the observed data values, so it cannot actually be formed prior to seeing the data. Moreover, we can see that it is clearly an “improper” prior that puts more and more weight on extreme values of $\theta$ and $\nu$. (Actually, this prior is pretty bonkers.) If by “coincidence” you were to form a prior that happened to correspond to this outcome then the REML estimator would be a MAP estimator under that prior, and hence would have a Bayesian interpretation as the estimator that maximises the posterior under that prior.