# Lasso-ing the order of a lag?

Suppose I have longitudinal data of the form $\mathbf Y = (Y_1, \ldots, Y_J) \sim \mathcal N(\mu, \Sigma)$ (I have multiple observations, this is just the form of a single one). I’m interested in restrictions on $\Sigma$. An unrestricted $\Sigma$ is equivalent to taking

with $\varepsilon_j \sim N(0, \sigma_j)$.

This is typically not done since it requires estimating $O(J^2)$ covariance parameters. A model is “lag-$k$” if we take

i.e. we only use the preceding $k$ terms to predict $Y_j$ from the history.

What I’d really like to do is use some kind of shrinkage idea to zero out some of the $\phi_{\ell j}$, like the LASSO. But the thing is, I also would like the method I use to prefer models which are lag-$k$ for some $k$; I’d like to penalize higher order lags more than lower order lags. I think this is something we would particularly like to do given that the predictors are highly correlated.

An additional issue is that if (say) $\phi_{35}$ is shrunk to $0$ I would also like it if $\phi_{36}$ is shrunk to $0$, i.e. the same lag is used in all of the conditional distributions.

I could speculate on this, but I don’t want to reinvent the wheel. Is there any LASSO techniques designed to get at this sort of problem? Am I better off just doing something else entirely, like stepwise inclusion of lag orders? Since my model space is small, I could even use an $L_0$ penalty on this problem I guess?

I also don’t really follow how ordered Lasso answers the question. It seems too restrictive, it is completely forcing the ordering of the coefficients. Whereas the original question may end up for some data having a solution where $\phi_{lj}$ is not strictly decreasing with l.