In his widely cited paper

Prior distributions for variance parameters in hierarchical models(916 citation so far on Google Scholar) Gelman proposes that good non-informative prior distributions for the variance in a hierarchical Bayesian model are the uniform distribution and the half t distribution. If I understand things right this works well when it is the location parameter (e.g. the mean) is of the main interest. Sometimes the variance parameter is of the main interest however, for example when analyzing human response data from timing tasks mean timing variability is often the measure of interest. In those cases it is not clear to me how variability could be modeled hierarchical with, for example, uniform distributions, as I after the analysis want to get the credibility of the mean variance both on the participant level and on the group level. In order to do that I need to use prior distributions that can be parametrized with the mean variance or similar, right?My question is then:

What distribution’s are recommended when building a hierarchical Bayesian model when the variance of the data is of the main interest?I know that the gamma distribution can be reparametrized to be specified by mean and SD. For example, the hierarchical model below is from Kruschke’s book Doing Bayesian Data Analysis. But Gelman outlines some problems with the gamma distribution in his article and I would be grateful for suggestions of alternatives, preferably alternatives that are not to difficult to get working in BUGS/JAGS.

**Answer**

I disagree with the way you interpret Gelman concerning the choice of the Gamma for scale parameter. The basis of hierarchical modeling is to relate individual parameters to a common one through a structure with unknown (typically mean and variance) parameters. In this sense, using a gamma distribution for the individual variance (or lognormal for heavier tail) conditioned to the mean variance and its dispersion looks valid to me (at least with regard to Gelman arguments).

The critics of Gelman about the gamma for scale parameter are about the fact that the gamma is used to approximate the Jeffreys by setting extreme values to its parameter. The problem is that depending on how extreme these values are (which is quite arbitrary) the posterior may be very different. This observation invalidates the use of this prior, at least when we don’t have information to set in the prior. In it discussion, it looks to me that the gamma or inverse-gamma is never calibrated in terms of mean and variance from prior information or from a hierarchical structure. So its recommendation concerns a context which is quite different from yours which, if I understand well your purpose, consists in using a hierarchical prior structure relating the individual variance through a structure whose mean and variance parameters are also estimated.

**Attribution***Source : Link , Question Author : Rasmus Bååth , Answer Author : peuhp*