In “The Elements of Statistical Learning” (2nd ed), p63, the authors give the following two formulations of the ridge regression problem:

ˆβridge=argminβ{N∑i=1(yi−β0−p∑j=1xijβj)2+λp∑j=1β2j}

and

ˆβridge=argminβN∑i=1(yi−β0−p∑j=1xijβj)2, subject to p∑j=1β2j≤t.

It is claimed that the two are equivalent, and that there is a one-to-one correspondence between the parameters λ and t.

It would appear that the first formulation is a Lagrangian relaxation of the second. However, I never had an intuitive understanding of how or why Lagrangian relaxations work.

Is there a simple way to demonstrate that the two formulations are indeed equivalent? If I have to choose, I’d prefer intuition over rigour.

Thanks.

**Answer**

The correspondence can most easily be shown using the Envelope Theorem.

First, the standard Lagrangian will have an additional λ⋅t term. This will not affect the maximization problem if we are just treating λ as given, so Hastie et al drop it.

Now, if you differentiate the full Lagrangian with respect to t, the Envelope Theorem says you can ignore the indirect effects of t through β, because you’re at a maximum. What you’ll be left with is the Lagrange multipler from λ⋅t.

But what does this mean intuitively? Since the constraint binds at the maximum, the derivative of the Lagrangian, evaluated at the maximum, is the same as the deriviate the original objective. Therefore the Lagrange multiplier gives the shadow price — the value in terms of the objective — of relaxing the constraint by increasing t.

I assume this is the correspondence Hastie et al. are referring to.

**Attribution***Source : Link , Question Author : NPE , Answer Author : Tristan*