Variance-covariance matrix for ridge regression with stochastic \lambda\lambda

In ridge regression with design matrix X, outcomes y, fixed regularization parameter \lambda, and errors \epsilon\sim\mathcal{N}(0, \sigma^2I), the computations for the ridge regression coefficients \hat\beta (aka the solution to \arg\min_b \big[(y-Xb)'(y-Xb) + \lambda b’b\big]) and their variance-covariance matrix var(\hat\beta) are:


\begin{align*}
M &:= (X’X + \lambda I)^{-1}X’ \\
\hat\beta &= My \\
var(\hat\beta) &= \sigma^2MM’
\end{align*}

The computation on var(\hat\beta) relies on \lambda being treated as a constant value, which makes M constant. Thus, we can apply the identity var(My) = Mvar(y)M’ and the fact that var(y) = \sigma^2I to derive var(\hat\beta).

However when \lambda is selected using X and y (for my application via cross-validation), \lambda and M become stochastic. How can var(\hat\beta) be updated to deal with a stochastic \lambda selected via cross-validation?

Answer

I’m guessing that these equations are maximum likelihood solutions. The MLE of a parameter takes as its variance-covariance matrix the inverse of the second derivative of the likelihood function. What this means is that var(\hat B)\sim∂^2L(data, \hat B)/∂B^2. If you include the lambda parameter, this partial derivative does not change. You introduce a covariance between B and lambda. The fact that you’re optimizing lambda via cross-validation rather than MLE doesn’t change the story for beta.

Attribution
Source : Link , Question Author : josliber , Answer Author : Nick Stauner

Leave a Comment