I have a generally basic question to ask here that has been troubling to me for a while. Through most of my reading of bayesian statistics, it stated matter-of-factedly that the marginal likelihood is often intractable or difficult to estimate. Why?

Reasons often stated include statements about high dimensional nature of the integral/summation to be estimated, or that the realm of possible models are infinite.

I would please like this community to point me to something that digs into why, and explains this issue in simple language.

Links to resources would also be appreciated. I have googled the terms in search of resources that explain this clearly, but most of them just state the issue without explanation. I also have the books pattern recognition in machine learning & the kevin murphy machine learning book. I am not satisfied with the explanations in these texts, so I am looking for something clear and simple.

**Answer**

Here is an answer by example. Suppose you have the following hierarchical model

$$ Y_{ig} \stackrel{ind}{\sim} N(\theta_g,1) \quad \theta_g \stackrel{ind}{\sim} N(\mu,\tau^2) \quad \mu|\tau^2 \sim N(m,\tau^2/k) \quad \tau^2 \sim IG(a,b) $$

for groups $g=1,\ldots,G$ and observations within a group $i=1,\ldots,n_g$ and known values $m,k,a,$ and $b$. With

$$y = (y_{1,1},\ldots,y_{n_1,1},y_{1,2},\ldots,y_{n_2,2},\ldots,y_{1,G},\ldots,y_{n_G,G}),$$

the marginal likelihood is

$$ p(y) = \int \cdots \int \prod_{g=1}^G \left[\prod_{i=1}^{n_g} N(y_{ig};\theta_g,1) \right] N(\theta_g; \mu,\tau^2) d\theta_1 \cdots d\theta_G d\mu d\tau^2.$$

The dimension of the integral is $G+2$ and if $G$ is large, then this a high dimensional integral. Most numerical integration techniques will need an extreme number of samples or iterations to obtain a reasonable approximation to this integral.

This integral happens to have a marginal likelihood in closed form, so you can evaluate how well a numeric integration technique can estimate the marginal likelihood. To understand why calculating the marginal likelihood is difficult, you could start simple, e.g. having a single observation, having a single group, having $\mu$ and $\sigma^2$ be known, etc. You can slowly make the problem more and more difficult and see how the numerical integration techniques fare relative to the truth. You will notice that they get worse and worse, i.e. they will need more and more samples or iterations to obtain the same accuracy, as the dimension of the problem, i.e. $G$, increases. Finally, let $Y_{ig} \stackrel{ind}{\sim} Po(e^{\theta_g})$ and now you have a marginal likelihood with no closed form. Based on your experience when you knew the truth, how much are you going to believe a numerical estimate when you don’t know the truth? I’m guessing you aren’t going to have much confidence in the numeric estimate.

**Attribution***Source : Link , Question Author : user1556364 , Answer Author : jaradniemi*