Posterior Predictive Distribution as Expectation of Likelihood

Say we have a posterior predictive density:

$$p(\tilde{y}|\mathbf{y}) = \int p(\tilde{y}|\theta)p(\theta|\mathbf{y})d\theta$$

In Hoff’s Bayesian Statistical Methods text, he suggests that to obtain an approximation of $p(\tilde{y}|\mathbf{y})$ by sampling from posterior distribution, and computing $\frac{1}{S}\sum_{s=1}^Sp(\tilde{y}|\theta^{(s)})$.

He justifies this by stating $p(\tilde{y} | \mathbf{y})$ is the posterior expectation of $p(\tilde{y}|\theta)$, but I actually can’t see the equivalence. How does one derive $p(\tilde{y} | \mathbf{y})$ from $p(\tilde{y}|\theta)$?


$\newcommand{\y}{\mathbf y}$We have
E_{\theta|\y}\left[f(\theta)\right] = \int f(\theta) p(\theta | \y)\,\text d\theta

just by definition of expectation (and you could cite the LOTUS as well), and since $p(\theta|\y)$ is the posterior density this is the posterior expectation of $f(\theta)$. Now choose
f(\theta) = p(\tilde y | \theta)

and then
E_{\theta|\y}\left[p(\tilde y | \theta)\right] = \int p(\tilde y | \theta) p(\theta | \y)\,\text d\theta.

I’m not sure if you also are wondering about the justification of this integral in the first place, but typically the data are assumed independent given the generating parameters so for a new point $\tilde y$ you’d have $\tilde y \perp \y | \theta$ which means
p(\tilde y | \y) = \int p(\tilde y , \theta | \y)\,\text d\theta \\
= \int p(\tilde y | \theta , \y) p(\theta | \y)\,\text d\theta \\
= \int p(\tilde y | \theta) p(\theta | \y)\,\text d\theta \\
= E_{\theta|\y}\left[p(\tilde y | \theta)\right]

so you can use the law of large numbers with posterior samples to produce an estimator of this density.

Source : Link , Question Author : fny , Answer Author : jld

Leave a Comment