One of the motivations for the elastic net was the following limitation of LASSO:

In the p>n case, the lasso selects at most n variables before it

saturates, because of the nature of the convex optimization problem.

This seems to be a limiting feature for a variable selection method.

Moreover, the lasso is not well defined unless the bound on the

L1-norm of the coefficients is smaller than a certain value.(http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2005.00503.x/full)

I understand that LASSO is a quadratic programming problem but also can be solved via LARS or element-wise gradient descent. But I do not understand where in these algorithms I encounter a problem if p>n where p is the number of predictors and n is the sample size. And why is this problem solved using elastic net where I augment the problem to p+n variables which clearly exceeds p.

**Answer**

As said, this is not a property of an algorithm but of the optimization problem. The KKT conditions basically give that for coefficient βj to be non-zero it has to correspond to a fixed correlation with the residual |Xtj(y−Xβ)|=λ (λ is the regularization parameter).

After resolving the various complications with absolute value etc, you are left with a linear equation for each non-zero coefficient. Since the rank of the matrix X is at most n when p>n, this is the number of equations that can be solved, and therefore there are at most n non-zeros (unless there are redundancies).

By the way, this is true for any loss function, not only the standard lasso with L2 loss. So it is in fact a property of the lasso penalty. There are many papers that show this KKT view and the resulting conclusions, I can point to our paper:

Rosset and Zhu, Piecewise Linear Regularized Solutions Paths, Annals of Stats 2007

and refs therein.

**Attribution***Source : Link , Question Author : user1137731 , Answer Author : user1137731*