My basic question is: how would you sample from an improper distribution? Does it even make sense to sample from an improper distribution?
Xi’an’s comment here kind of addresses the question, but I was looking for some more details on this.
More specific to MCMC:
In talking about MCMC and reading papers, authors stress on having obtained proper posterior distributions. There is the famous Geyer(1992) paper where the author forgot to check if their posterior was proper (otherwise an excellent paper).
But, suppose a we have a likelihood f(x|θ) and an improper prior distribution on θ such that the resulting posterior is also improper, and MCMC is used to sample from the distribution. In this case, what does the sample indicate? Is there any useful information in this sample? I am aware that the Markov chain here is then either transient or null-recurrent. Are there any positive take-aways if it is null-recurrent?
Finally, in Neil G’s answer here, he mention’s
you can typically sample (using MCMC) from the posterior even if it’s
He mentions such sampling is common in deep learning. If this is true, how does this make sense?
Sampling from an improper posterior (density) f does not make sense from a probabilistic/theoretical point of view. The reason for this is that the function f does not have a finite integral over the parameter space and, consequently, cannot be linked to a (finite measure) probability model (Ω,σ,P) (space, sigma-algebra, probability measure).
If you have a model with an improper prior that leads to an improper posterior, in many cases you can still sample from it using MCMC, for instance Metropolis-Hastings, and the “posterior samples” may look reasonable. This looks intriguing and paradoxical at first glance. However, the reason for this is that MCMC methods are restricted to numerical limitations of the computers in practice, and therefore, all supports are bounded (and discrete!) for a computer. Then, under those restrictions (boundedness and discreteness) the posterior is actually proper in most cases.
There is a great reference by Hobert and Casella that presents an example (of a slightly different nature) where you can construct a Gibbs sampler for a posterior, the posterior samples look perfectly reasonable, but the posterior is improper!
A similar example has recently appeared here. In fact, Hobert and Casella warns the reader that MCMC methods cannot be used to detect impropriety of the posterior and that this has to be checked separately before implementing any MCMC methods.
- Some MCMC samplers, such as Metropolis-Hastings, can (but shouldn’t) be used to sample from an improper posterior since the computer bounds and dicretizes the parameter space. Only if you have huge samples, you may be able to observe some strange things. How well you can detect these issues also depends on the “instrumental” distribution employed in your sampler. The latter point requires a more extensive discussion, so I prefer to leave it here.
- (Hobert and Casella). The fact that you can construct a Gibbs sampler (conditional model) for a model with an improper prior does not imply that the posterior (joint model) is proper.
- A formal probabilistic interpretation of the posterior samples require the propriety of the posterior. Convergence results and proofs are established only for proper probability distributions/measures.
P.S. (a bit tongue in cheek): Do not always believe what people do in Machine Learning. As Prof. Brian Ripley said: “machine learning is statistics minus any checking of models and assumptions”.