Have there been large scale studies of MCMC methods that compare the performance of several different algorithms on a suite of test densities? I am thinking of something equivalent to Rios and Sahinidis’ paper (2013), which is a thorough comparison of a large number of derivativefree blackbox optimizers on several classes of test functions.
For MCMC, performance can be estimated in, e.g., effective number of samples (ESS) per density evaluation, or some other appropriate metric.
A few comments:
I appreciate that performance will strongly depend on details of the target pdf, but a similar (possibly not identical) argument holds for optimization, and nonetheless there is a plethora of benchmark functions, suites, competitions, papers, etc. that deals with benchmarking optimization algorithms.
Also, it is true that MCMC differs from optimization in that comparatevely much more care and tuning is needed from the user. Nonetheless, there are now several MCMC methods that require little or no tuning: methods that adapt in the burnin phase, during sampling, or multistate (also called ensemble) methods (such as Emcee) that evolve multiple interacting chains and use information from other chains to guide the sampling.
I am particularly interested in the comparison between standard and multistate (aka ensemble) methods. For the definition of multistate, see Section 30.6 of MacKay’s book:
In a multistate method, multiple parameter vectors x are maintained;
they evolve individually under moves such as Metropolis and Gibbs;
there are also interactions among the vectors.
 This question originated from here.
Update
 For an interesting take on multistate aka ensemble methods, see this blog post by Bob Carpenter on Gelman’s blog, and my comment referring to this CV post.
Answer
After some online searching, I have come under the impression that a comprehensive benchmark of established MCMC methods, analogous to what one can find in the optimization literature, does not exist. (I’d be happy to be wrong here.)
It is easy to find comparisons of a few MCMC methods on specific problems within an applied domain. This would be okay if we could pool this information — however, the quality of such benchmarks is often insufficient (e.g., due to lack in the reported metrics or poor design choices).
In the following I will post what I believe are valuable contributions as I find them:

Nishihara, Murray and Adams, Parallel MCMC with Generalized Elliptical Slice Sampling, JMLR (2014). The authors propose a novel multistate method, GESS, and perform a comparison with 6 other singlestate and multistate methods on 7 test functions. They evaluate performance as ESS (Effective Sample Size) per second and per function evaluation.

SamplerCompare is a R package with the goal of benchmarking MCMC algorithms — exactly what was I was asking about in my original question. Unfortunately, the package contains only a few test functions; the accompanying paper reports no actual benchmarks (just a small example); and it seems there have been no followups.
Thompson, Madeleine B. “Introduction to SamplerCompare.” Journal of Statistical Software 43.12 (2011): 110 (link).
 For an interesting take on multistate aka ensemble methods, see this blog post by Bob Carpenter on Gelman’s blog, and my comment referring to this CV post.
Attribution
Source : Link , Question Author : lacerbi , Answer Author : lacerbi