I’m interested in using the horseshoe prior (or the related hierarchical-shrinkage family of priors) for regression coefficients of a traditional multilevel regression (e.g., random slopes/intercepts). Horseshoe priors are similar to lasso and other regularization techniques, but have been found to have better performance in many situations. A regression coefficient βi, where i∈{1,D} predictors, has a horseshoe prior if its standard deviation is the product of a local (λi) and global (τ) scaling parameter.

βi∼Normal(0,λi)λi∼Cauchy+(0,τ)τ∼Cauchy+(0,1)I am uncertain as to the best way to expand this to a random intercept framework. For example, group j‘s ith coefficient is often normally distributed around a group-level mean (γi) with a group level standard deviation (σi).

βi,j∼Normal(γi,σi)γi∼Normal(0,ψ)σi∼Cauchy+(0,c)

This tends to shrink estimates of βi,j towards γi based on the average dispersion around the coefficient mean. However, if only a small number of groups are substantially different from the mean, I’m concerned that the predictive or explanatory ability of the model may decrease. If I wanted to add a horseshoe prior to these coefficients, would it be appropriate to give each group’s coefficient it’s own independent λ?

βi,j∼Normal(γi,λi,j)γi∼Normal(0,λi,0)λi,j∼Cauchy+(0,τ)τ∼Cauchy+(0,1)

Would it be better for the λi,j‘s to have an extra level of hierarchy that controls for dispersion around γi?

βi,j∼Normal(γi,λi,j)γi∼Normal(0,λi,0)λi,j∼Cauchy+(0,ϕi)λi,0∼Cauchy+(0,τ)ϕi∼Cauchy+(0,τ)τ∼Cauchy+(0,1)

I’ve played around with modeling some of these options in Stan, but I would appreciate thoughts or advice on whether or not these formulations make statistical sense.

**Answer**

**Attribution***Source : Link , Question Author : C.R. Peterson , Answer Author : Community*