In ridge regression with design matrix X, outcomes y, fixed regularization parameter \lambda, and errors \epsilon\sim\mathcal{N}(0, \sigma^2I), the computations for the ridge regression coefficients \hat\beta (aka the solution to \arg\min_b \big[(y-Xb)'(y-Xb) + \lambda b’b\big]) and their variance-covariance matrix var(\hat\beta) are:

\begin{align*}

M &:= (X’X + \lambda I)^{-1}X’ \\

\hat\beta &= My \\

var(\hat\beta) &= \sigma^2MM’

\end{align*}

The computation on var(\hat\beta) relies on \lambda being treated as a constant value, which makes M constant. Thus, we can apply the identity var(My) = Mvar(y)M’ and the fact that var(y) = \sigma^2I to derive var(\hat\beta).

However when \lambda is selected using X and y (for my application via cross-validation), \lambda and M become stochastic. How can var(\hat\beta) be updated to deal with a stochastic \lambda selected via cross-validation?

**Answer**

I’m guessing that these equations are maximum likelihood solutions. The MLE of a parameter takes as its variance-covariance matrix the inverse of the second derivative of the likelihood function. What this means is that var(\hat B)\sim∂^2L(data, \hat B)/∂B^2. If you include the lambda parameter, this partial derivative does not change. You introduce a covariance between *B* and lambda. The fact that you’re optimizing lambda via cross-validation rather than MLE doesn’t change the story for beta.

**Attribution***Source : Link , Question Author : josliber , Answer Author : Nick Stauner*