# Lasso-ing the order of a lag?

Suppose I have longitudinal data of the form $\mathbf Y = (Y_1, \ldots, Y_J) \sim \mathcal N(\mu, \Sigma)$ (I have multiple observations, this is just the form of a single one). I’m interested in restrictions on $\Sigma$. An unrestricted $\Sigma$ is equivalent to taking

with $\varepsilon_j \sim N(0, \sigma_j)$.

This is typically not done since it requires estimating $O(J^2)$ covariance parameters. A model is “lag-$k$” if we take

i.e. we only use the preceding $k$ terms to predict $Y_j$ from the history.

What I’d really like to do is use some kind of shrinkage idea to zero out some of the $\phi_{\ell j}$, like the LASSO. But the thing is, I also would like the method I use to prefer models which are lag-$k$ for some $k$; I’d like to penalize higher order lags more than lower order lags. I think this is something we would particularly like to do given that the predictors are highly correlated.

An additional issue is that if (say) $\phi_{35}$ is shrunk to $0$ I would also like it if $\phi_{36}$ is shrunk to $0$, i.e. the same lag is used in all of the conditional distributions.

I could speculate on this, but I don’t want to reinvent the wheel. Is there any LASSO techniques designed to get at this sort of problem? Am I better off just doing something else entirely, like stepwise inclusion of lag orders? Since my model space is small, I could even use an $L_0$ penalty on this problem I guess?

## Answer

You can do cross validation repeatedly from k = 0 to whatever the maximum is, and plot the performance against k. Since the model is being tested on data it hasn’t seen before, there is no guarantee the complex models will perform better, and indeed you should see a degradation in performance if the model becomes too complex due to overfitting. Personally I think this is safer and easier to justify than having an arbitrary penalty factor, but your mileage may vary.

I also don’t really follow how ordered Lasso answers the question. It seems too restrictive, it is completely forcing the ordering of the coefficients. Whereas the original question may end up for some data having a solution where $\phi_{lj}$ is not strictly decreasing with l.

Attribution
Source : Link , Question Author : guy , Answer Author : Nir Friedman