Effective Sample Size for posterior inference from MCMC sampling

When obtaining MCMC samples to make inference on a particular parameter, what are good guides for the minimum number of effective samples that one should aim for?

And, does this advice change as the model becomes more or less complex?

The question you are asking is different from “convergence diagnostics”. Lets say you have run all convergence diagnostics(choose your favorite(s)), and now are ready to start sampling from the posterior.

There are two options in terms of effective sample size(ESS), you can choose a univariate ESS or a multivariate ESS. A univariate ESS will provide an effective sample size for each parameter separately, and conservative methods dictate, you choose the smallest estimate. This method ignores all cross-correlations across components. This is probably what most people have been using for a while

Recently, a multivariate definition of ESS was introduced. The multivariate ESS returns one number for the effective sample size for the quantities you want to estimate; and it does so by accounting for all the cross-correlations in the process. Personally, I far prefer multivariate ESS. Suppose you are interested in the $p$-vector of means of the posterior distribution. The mESS is defined as follows

Here

1. $\Lambda$ is the covariance structure of the posterior (also the asymptotic covariance in the CLT if you had independent samples)
2. $\Sigma$ is the asymptotic covariance matrix in the Markov chain CLT (different from $\Lambda$ since samples are correlated.
3. $p$ is number of quantities being estimated (or in this case, the dimension of the posterior.
4. $|\cdot|$ is the determinant.

mESS can be estimated by using the sample covariance matrix to estimate $\Lambda$ and the batch means covariance matrix to estimate $\Sigma$. This has been coded in the function multiESS in the R package mcmcse.

This recent paper provides a theoretically valid lower bound of the number of effective samples required. Before simulation, you need to decide

1. $\epsilon$: the precision. $\epsilon$ is the fraction of error you want the Monte Carlo to be in comparison to the posterior error. This is similar to the margin of error idea when doing sample size calculations in the classical setting.
2. $\alpha$: the level for constructing confidence intervals.
3. $p$: the number of quantities you are estimating.

With these three quantities, you will know how many effective samples you require. The paper asks to stop simulation the first time

where $\Gamma(\cdot)$ is the gamma function. This lower bound can be calculated by using minESS in the R package mcmcse.

So now suppose you have $p = 20$ parameters in the posterior, and you want $95\%$ confidence in your estimate, and you want the Monte Carlo error to be 5% ($\epsilon = .05$) of the posterior error, you will need

> minESS(p = 20, alpha = .05, eps = .05)
[1] 8716


This is true for any problem (under regularity conditions). The way this method adapts from problem to problem is that slowly mixing Markov chains take longer to reach that lower bound, since mESS will be smaller. So now you can check a couple of times using multiESS whether your Markov chain has reached that bound; if not go and grab more samples.