Why is “relaxed lasso” different from standard lasso?

If we start with a set of data (X,Y), apply Lasso to it and obtain a solution βL, we can apply Lasso again to the data set (XS,Y), where S is the set of non-zero indexes of βL, to obtain a solution, βRL, called ‘relaxed LASSO’ solution (correct me if I’m wrong!). The solution βL must satisfy the Karush–Kuhn–Tucker (KKT) conditions for (X,Y) but, given the form of the KKT conditions for (XS,Y), does not it also satisfy these? If so, what is the point of doing LASSO a second time?

This question is a follow up to: Advantages of doing “double lasso” or performing lasso twice?


From definition 1 of Meinshausen(2007), there are two parameters controlling the solution of the relaxed Lasso.

The first one, λ, controls the variable selection, whereas the second, ϕ, controls the shrinkage level. When ϕ=1 both Lasso and relaxed-Lasso are the same (as you said!), but for ϕ<1 you obtain a solution with coefficients closer to what would give an orthogonal projection on the selected variables (kind of soft de-biasing).

This formulation actually corresponds to solve two problems:

  1. First the full Lasso with penalization parameter λ
  2. Second the Lasso on XS, which is X reduced to variables selected by 1, with a penalization parameter λϕ.

Source : Link , Question Author : Coca , Answer Author : amoeba

Leave a Comment