Have there been large scale studies of MCMC methods that compare the performance of several different algorithms on a suite of test densities? I am thinking of something equivalent to Rios and Sahinidis’ paper (2013), which is a thorough comparison of a large number of derivative-free black-box optimizers on several classes of test functions.
For MCMC, performance can be estimated in, e.g., effective number of samples (ESS) per density evaluation, or some other appropriate metric.
A few comments:
I appreciate that performance will strongly depend on details of the target pdf, but a similar (possibly not identical) argument holds for optimization, and nonetheless there is a plethora of benchmark functions, suites, competitions, papers, etc. that deals with benchmarking optimization algorithms.
Also, it is true that MCMC differs from optimization in that comparatevely much more care and tuning is needed from the user. Nonetheless, there are now several MCMC methods that require little or no tuning: methods that adapt in the burn-in phase, during sampling, or multi-state (also called ensemble) methods (such as Emcee) that evolve multiple interacting chains and use information from other chains to guide the sampling.
I am particularly interested in the comparison between standard and multi-state (aka ensemble) methods. For the definition of multi-state, see Section 30.6 of MacKay’s book:
In a multi-state method, multiple parameter vectors x are maintained;
they evolve individually under moves such as Metropolis and Gibbs;
there are also interactions among the vectors.
- This question originated from here.
After some online searching, I have come under the impression that a comprehensive benchmark of established MCMC methods, analogous to what one can find in the optimization literature, does not exist. (I’d be happy to be wrong here.)
It is easy to find comparisons of a few MCMC methods on specific problems within an applied domain. This would be okay if we could pool this information — however, the quality of such benchmarks is often insufficient (e.g., due to lack in the reported metrics or poor design choices).
In the following I will post what I believe are valuable contributions as I find them:
Nishihara, Murray and Adams, Parallel MCMC with Generalized Elliptical Slice Sampling, JMLR (2014). The authors propose a novel multi-state method, GESS, and perform a comparison with 6 other single-state and multi-state methods on 7 test functions. They evaluate performance as ESS (Effective Sample Size) per second and per function evaluation.
SamplerCompare is a R package with the goal of benchmarking MCMC algorithms — exactly what was I was asking about in my original question. Unfortunately, the package contains only a few test functions; the accompanying paper reports no actual benchmarks (just a small example); and it seems there have been no follow-ups.
Thompson, Madeleine B. “Introduction to SamplerCompare.” Journal of Statistical Software 43.12 (2011): 1-10 (link).