# Posterior distribution and MCMC [duplicate]

I have read something like 6 articles on Markov Chain Monte carlo methods, there are a couple of basic points I can’t seem to wrap my head around.

1. How can you “draw samples from the posterior distribution” without first knowing the properties of said distribution?

2. Again, how can you determine which parameter estimate “fits your data better” without first knowing your posterior distribution?

3. If you already know the properties of your posterior distribution (as is indicated by 1) and 2)), then what’s the point of using this method in the first place?

This just seems like circular reasoning to me.

If this was not a clear conflict of interest, I would suggest you invest more time on the topic of MCMC algorithm and read a whole book rather than a few (6?) articles that can only provide a partial perspective.

How can you “draw samples from the posterior distribution” without
first knowing the properties of said distribution?

MCMC is based on the assumption that the productcan be numerically computed (hence is known) for a given $\theta$, where $x^\text{obs}$ denotes the observation, $\pi(\cdot)$ the prior, and $f(x^\text{obs}|\theta)$ the likelihood. This does not imply an in-depth knowledge about this function of $\theta$. Still, from a mathematical perspective the posterior density is completely and entirely determined by
Thus, it is not particularly surprising that simulation methods can be found using solely the input of the product The amazing feature of Monte Carlo methods is that some methods like Markov chain Monte Carlo (MCMC) algorithms do not formally require anything further than this computation of the product, when compared with accept-reject algorithms for instance, which calls for an upper bound. A related software like Stan operates on this input and still delivers high end performances with tools like NUTS and HMC, including numerical differentiation.

A side comment written later in the light of some of the other answers is that the normalising constantis not particularly useful for conducting Bayesian inference in that, were I to “know” its exact numerical value in addition to the function in the numerator of (1), $\mathfrak{Z}=3.17232\,10^{-23}$ say, I would not have made any progress towards finding Bayes estimates or credible regions. (The only exception when this constant matters is in conducting Bayesian model comparison.)

When teaching about MCMC algorithms, my analogy is that in a videogame we have a complete map (the posterior) and a moving player that can only illuminate a portion of the map at once. Visualising the entire map and spotting the highest regions is possible with enough attempts (and a perfect remembrance of things past!). A local and primitive knowledge of the posterior density (up to a constant) is therefore sufficient to learn about the distribution.

Again, how can you determine which parameter estimate “fits your data
better” without first knowing your posterior distribution?

Again, the distribution is known in a mathematical or numerical sense. The Bayes parameter estimates provided by MCMC, if needed, are based on the same principle as most simulation methods, the law of large numbers. More generally, Monte Carlo based (Bayesian) inference replaces the exact posterior distribution with an empirical version. Hence, once more, a numerical approach to the posterior, one value at a time, is sufficient to build a convergent representation of the associated estimator. The only restriction is the available computing time, i.e., the number of terms one can call in the law of large numbers approximation.