Suppose I have longitudinal data of the form Y=(Y1,…,YJ)∼N(μ,Σ) (I have multiple observations, this is just the form of a single one). I’m interested in restrictions on Σ. An unrestricted Σ is equivalent to taking
This is typically not done since it requires estimating O(J2) covariance parameters. A model is “lag-k” if we take
i.e. we only use the preceding k terms to predict Yj from the history.
What I’d really like to do is use some kind of shrinkage idea to zero out some of the ϕℓj, like the LASSO. But the thing is, I also would like the method I use to prefer models which are lag-k for some k; I’d like to penalize higher order lags more than lower order lags. I think this is something we would particularly like to do given that the predictors are highly correlated.
An additional issue is that if (say) ϕ35 is shrunk to 0 I would also like it if ϕ36 is shrunk to 0, i.e. the same lag is used in all of the conditional distributions.
I could speculate on this, but I don’t want to reinvent the wheel. Is there any LASSO techniques designed to get at this sort of problem? Am I better off just doing something else entirely, like stepwise inclusion of lag orders? Since my model space is small, I could even use an L0 penalty on this problem I guess?
You can do cross validation repeatedly from k = 0 to whatever the maximum is, and plot the performance against k. Since the model is being tested on data it hasn’t seen before, there is no guarantee the complex models will perform better, and indeed you should see a degradation in performance if the model becomes too complex due to overfitting. Personally I think this is safer and easier to justify than having an arbitrary penalty factor, but your mileage may vary.
I also don’t really follow how ordered Lasso answers the question. It seems too restrictive, it is completely forcing the ordering of the coefficients. Whereas the original question may end up for some data having a solution where ϕlj is not strictly decreasing with l.