Say we have a posterior predictive density:

$$p(\tilde{y}|\mathbf{y}) = \int p(\tilde{y}|\theta)p(\theta|\mathbf{y})d\theta$$

In Hoff’s Bayesian Statistical Methods text, he suggests that to obtain an approximation of $p(\tilde{y}|\mathbf{y})$ by sampling from posterior distribution, and computing $\frac{1}{S}\sum_{s=1}^Sp(\tilde{y}|\theta^{(s)})$.

He justifies this by stating $p(\tilde{y} | \mathbf{y})$ is the posterior expectation of $p(\tilde{y}|\theta)$, but I actually can’t see the equivalence. How does one derive $p(\tilde{y} | \mathbf{y})$ from $p(\tilde{y}|\theta)$?

**Answer**

$\newcommand{\y}{\mathbf y}$We have

$$

E_{\theta|\y}\left[f(\theta)\right] = \int f(\theta) p(\theta | \y)\,\text d\theta

$$

just by definition of expectation (and you could cite the LOTUS as well), and since $p(\theta|\y)$ is the posterior density this is the posterior expectation of $f(\theta)$. Now choose

$$

f(\theta) = p(\tilde y | \theta)

$$

and then

$$

E_{\theta|\y}\left[p(\tilde y | \theta)\right] = \int p(\tilde y | \theta) p(\theta | \y)\,\text d\theta.

$$

I’m not sure if you also are wondering about the justification of this integral in the first place, but typically the data are assumed independent given the generating parameters so for a new point $\tilde y$ you’d have $\tilde y \perp \y | \theta$ which means

$$

p(\tilde y | \y) = \int p(\tilde y , \theta | \y)\,\text d\theta \\

= \int p(\tilde y | \theta , \y) p(\theta | \y)\,\text d\theta \\

= \int p(\tilde y | \theta) p(\theta | \y)\,\text d\theta \\

= E_{\theta|\y}\left[p(\tilde y | \theta)\right]

$$

so you can use the law of large numbers with posterior samples to produce an estimator of this density.

**Attribution***Source : Link , Question Author : fny , Answer Author : jld*