Are there any analytical results or experimental papers regarding the optimal choice of the coefficient of the ℓ1 penalty term. By optimal, I mean a parameter that maximizes the probability of selecting the best model, or that minimizes the expected loss. I am asking because often it is impractical to choose the parameter by cross-validation or bootstrap, either because of a large number of instances of the problem, or because of the size of the problem at hand. The only positive result I am aware of is Candes and Plan, Near-ideal model selection by ℓ1 minimization.
Answer
Checkout Theorem 5.1 of this Bickel et al.. A statistically optimal choice in terms of the error ‖ is \lambda = A \sigma_{\text{noise}} \sqrt{\dfrac{\log p}{n}} (with high probability), for a constant A > 2\sqrt{2}.
Attribution
Source : Link , Question Author : gappy , Answer Author : dohmatob