I have read something like 6 articles on Markov Chain Monte carlo methods, there are a couple of basic points I can’t seem to wrap my head around.

How can you “draw samples from the posterior distribution” without first knowing the properties of said distribution?

Again, how can you determine which parameter estimate “fits your data better” without first knowing your posterior distribution?

If you already know the properties of your posterior distribution (as is indicated by 1) and 2)), then what’s the point of using this method in the first place?

This just seems like circular reasoning to me.

**Answer**

*If this was not a clear conflict of interest, I would suggest you invest more time on the topic of MCMC algorithm and read a whole book rather than a few (6?) articles that can only provide a partial perspective.*

How can you “draw samples from the posterior distribution” without

first knowing the properties of said distribution?

MCMC is based on the assumption that the productπ(θ)f(xobs|θ)can be numerically computed (hence is known) for a given θ, where xobs denotes the observation, π(⋅) the prior, and f(xobs|θ) the likelihood. This does not imply an in-depth knowledge about this function of θ. Still, from a mathematical perspective the posterior density is completely and entirely determined by

π(θ|xobs)=π(θ)f(xobs|θ)∫Θπ(θ)f(xobs|θ)dθThus, it is not particularly surprising that simulation methods can be found using solely the input of the product π(θ)×f(xobs|θ) The amazing feature of Monte Carlo methods is that some methods like Markov chain Monte Carlo (MCMC) algorithms do not formally require anything further than this computation of the product, when compared with accept-reject algorithms for instance, which calls for an upper bound. A related software like Stan operates on this input and still delivers high end performances with tools like NUTS and HMC, including numerical differentiation.

A side comment written later in the light of some of the other answers is that the normalising constantZ=∫Θπ(θ)f(xobs|θ)dθis not particularly useful for conducting Bayesian inference in that, were I to “know” its exact numerical value in addition to the function in the numerator of (1), Z=3.1723210−23 say, I would not have made any progress towards finding Bayes estimates or credible regions. (The only exception when this constant matters is in conducting Bayesian model comparison.)

When teaching about MCMC algorithms, my analogy is that in a videogame we have a complete map (the posterior) and a moving player that can only illuminate a portion of the map at once. Visualising the entire map and spotting the highest regions is possible with enough attempts (and a perfect remembrance of things past!). A local and primitive knowledge of the posterior density (up to a constant) is therefore sufficient to learn about the distribution.

Again, how can you determine which parameter estimate “fits your data

better” without first knowing your posterior distribution?

Again, the distribution is *known* in a mathematical or numerical sense. The Bayes parameter estimates provided by MCMC, if needed, are based on the same principle as most simulation methods, *the law of large numbers*. More generally, Monte Carlo based (Bayesian) inference replaces the exact posterior distribution with an empirical version. Hence, once more, a numerical approach to the posterior, one value at a time, is sufficient to build a convergent representation of the associated estimator. The only restriction is the available computing time, i.e., the number of terms one can call in the law of large numbers approximation.

If you already know the properties of your posterior distribution (as

is indicated by 1) and 2)), then what’s the point of using this method

in the first place?

It is the very paradox of (1) that this is a perfectly well-defined mathematical object such that most integrals related with (1) including its denominator may be out of reach from analytical and numerical methods. Exploiting the stochastic nature of the object by simulation methods (Monte Carlo integration) is a natural and manageable alternative that has proven immensely helpful.

**Connected X validated questions:**

- Confusion related to MCMC technique
- What are Monte Carlo simulations?
- Is Markov chain based sampling the “best” for Monte Carlo sampling? Are there alternative schemes available?
- MCMC; Can we be sure that we have a ”pure” and ”large enough” sample from the posterior? How can it work if we are not?
- How would you explain Markov Chain Monte Carlo (MCMC) to a layperson?
- How to do MC integration from Gibbs sampling of posterior?

**Attribution***Source : Link , Question Author : Magnus , Answer Author : Xi’an*