What are some well known improvements over textbook MCMC algorithms that people use for bayesian inference?

When I’m coding a Monte Carlo simulation for some problem, and the model is simple enough, I use a very basic textbook Gibbs sampling. When it’s not possible to use Gibbs sampling, I code the textbook Metropolis-Hastings I’ve learned years ago. The only thought I give to it is choosing the jumping distribution or its parameters.

I know there are hundreds and hundreds of specialized methods that improve over those textbook options, but I usually never think about using/learning them. It usually feels like it’s too much effort to improve a little bit what is already working very well.

But recently I’ve been thinking if maybe there aren’t new general methods that can improve over what I’ve been doing. It’s been many decades since those methods were discovered. Maybe I’m really outdated!

Are there any well known alternatives to Metropolis-Hastings that are:

  • reasonably easy to implement,
  • as universally appliable as MH,
  • and always improves over MH’s results in some sense (computational performance, accuracy, etc…)?

I know about some very specialized improvements for very specialized models, but are there some general stuff everybody uses that I don’t know?

Answer

I’m not an expert in any of these, but I thought I’d put them out there anyway to see what the community thought. Corrections are welcome.

One increasingly popular method, which is not terribly straightforward to implement, is called Hamiltonian Monte Carlo (or sometimes Hybrid Monte Carlo). It uses a physical model with potential and kinetic energy to simulate a ball rolling around the parameter space, as described in this paper by Radford Neal. The physical model takes a fair amount of computational resources, so you tend to get many fewer updates, but the updates tend to be less correlated. HMC is the engine behind the new STAN software that is being developed as a more efficient and flexible alternative to BUGS or JAGS for statistical modeling.

There’s also a whole cluster of methods that involve “heating up” the Markov chain, which you can think of as introducing thermal noise to the model and increasing the chances of sampling low-probability states. At first glance, that seems like a bad idea, since you want the model to sample in proportion to the posterior probability. But you actually only end up using the “hot” states to help the chain mix better. The actual samples are only collected when the chain is at its “normal” temperature. If you do it correctly, you can use the heated chains to find modes that an ordinary chain wouldn’t be able to get to because of large valleys of low probability blocking the transition from mode-to-mode. A few examples of these methods include Metropolis-coupled MCMC, tempered transitions, parallel tempering, and annealed importance sampling.

Finally, you can use sequential Monte Carlo or particle filtering when the rejection rate would be so high that these other methods would all fail. I know the least about this family of methods, so my description may be incorrect here, but my understanding is that it works like this. You start out by running your favorite sampler, even though the chances of rejection are essentially one. Rather than rejecting all your samples, you pick the least objectionable ones, and initialize new samplers from there, repeating the process until you find some samples that you can actually accept. Then you go back and correct for the fact that your samples were nonrandom, because you didn’t initialize your samplers from random locations.

Hope this helps.

Attribution
Source : Link , Question Author : Rafael S. Calsaverini , Answer Author : alberto

Leave a Comment