# Ridge regression – Bayesian interpretation

I have heard that ridge regression can be derived as the mean of a posterior distribution, if the prior is adequately chosen. Is the intuition that the constraints as set on the regression coefficients by the prior (e.g. standard normal distributions around 0) are identical / replace the penalty set on the squared size of the coefficients? Does the prior have to be Gaussian for this equivalence to hold?

No, in the sense that other priors do logically relate to other penalties. In general you do want more mass near zero effect ($\beta=0$) to reduce overfitting/over-interpretation. Ridge is a quadratic (L2, Gaussian) penalty, lasso is an $|\beta|$ (L1, Laplace or double exponential distribution) penalty. Many other penalties (priors) are available. The Bayesian approach has the advantage of yielding a solid interpretation (and solid credible intervals) whereas penalized maximum likelihood estimation (ridge, lasso, etc.) yields $P$-values and confidence intervals that are hard to interpret, because the frequentist approach is somewhat confused by biased (shrunk towards zero) estimators.