# Bayesian lasso vs spike and slab

Question: What are the advantages/disadvantages of using one prior over the other for variable selection?

Suppose I have the likelihood:
$$y∼N(Xw,σ2I)y\sim\mathcal{N}(Xw,\sigma^2I)$$
where I can put either one of the priors:
$$wi∼πδ0+(1−π)N(0,100)π=0.9, w_i\sim \pi\delta_0+(1-\pi)\mathcal{N}(0,100)\\ \pi=0.9\,,$$
or:
$$wi∼exp(−λ|wi|)λ∼Γ(1,1). w_i\sim \exp(-\lambda|w_i|)\\ \lambda \sim \Gamma(1,1)\,.$$

I put $$π=0.9\pi=0.9$$ to emphasize most of the weights are zero and a gamma prior on $$λ\lambda$$ to pick the ‘regularizing’ parameter.

However, my professor keeps insisting that the lasso version ‘shrinks’ the coefficients and is not actually doing proper variable selection, i.e. there is an over-shrinkage of even the relevant parameters.

I personally find implementing the Lasso version easier since I use variational Bayes. In fact the Sparse Bayesian Learning paper which effectively puts a prior of $$1|wi|\frac{1}{|w_i|}$$ gives even sparser solutions.

## Reflection

Since I’ve left academia, I’ve had a chance to get some more practical experience in this area. While it is true that spike + slab methods do put a non-zero prior and hence a likelihood, the lasso based methods are (extremely) fast, and it is enough to look at the mean of the weight distribution. When you are dealing with potentially millions of parameters, this matters.

I’ve also gone on to recognise that my professor is an idiot who can’t get past the 90s.