Reading the paper “Forecasting at Scale” (FBProphet forecasting tool, see https://peerj.com/preprints/3190.pdf) I came across the term “sparse prior”.
The authors explain that they were using such a “sparse prior” in modelling a vector of rate deviations δ from some scalar rate k, which is a model parameter in the logistic growth model.
As they state that δj∼Laplace(0,τ), do I understand correctly that “sparse” refers to the vector carrying elements close to zero, if the parameter τ was small? I am confused, because I thought that all vector elements needed to be parameters of the regression, but defining them like that only leaves the parameters k and τ as free model parameters, doesn’t it?
Also, is the use of the Laplace distribution to generate the prior common? I do not understand why it is preferred over e.g. a normal distribution.
Sparse data is data with many zeros. Here the authors seem to be calling the prior as sparse because it favorites the zeros. This is pretty self-explanatory if you look at the shape of Laplace (aka double exponential) distribution, that is peaked around zero.
(image source Tibshirani, 1996)
This effect is true for any value of τ (the distribution is always peaked at it’s location parameter, here equal to zero), although the smaller the value of the parameter, the more regularizing effect it has.
For this reason Laplace prior is often used as robust prior, having the regularizing effect. Having this said, the Laplace prior is popular choice, but if you want really sparse solutions there may be better choices, as described by Van Erp et al (2019).
Van Erp, S., Oberski, D. L., & Mulder, J. (2019). Shrinkage Priors for Bayesian Penalized Regression. Journal of Mathematical Psychology, 89, 31-50. doi:10.1016/j.jmp.2018.12.004