Can regularization be helpful if we are interested only in modeling, not in forecasting?

Can regularization be helpful if we are interested only in estimating (and interpreting) the model parameters, not in forecasting or prediction?

I see how regularization/cross-validation is extremely useful if your goal is to make good forecasts on new data. But what if you’re doing traditional economics and all you care about is estimating β? Can cross-validation also be useful in that context? The conceptual difficulty I struggle with is that we can actually compute L(Y,ˆY) on test data, but we can never compute L(β,ˆβ) because the true β is by definition never observed. (Take as given the assumption that there even is a true β, i.e. that we know the family of models from which the data were generated.)

Suppose your loss is L(β,ˆβ)=. You face a bias-variance tradeoff, right? So, in theory, you might be better off doing some regularization. But how can you possibly select your regularization parameter?

I’d be happy to see a simple numerical example of a linear regression model, with coefficients \beta \equiv (\beta_1, \beta_2, \ldots, \beta_k), where the researcher’s loss function is e.g. \lVert \beta – \hat{\beta} \rVert, or even just (\beta_1 – \hat{\beta}_1)^2. How, in practice, could one use cross-validation to improve expected loss in those examples?


Edit: DJohnson pointed me to https://www.cs.cornell.edu/home/kleinber/aer15-prediction.pdf, which is relevant to this question. The authors write that

Machine learning techniques … provide a disciplined way to predict
\hat{Y} which (i) uses the data itself to decide how to make the
bias-variance trade-off and (ii) allows for search over a very rich
set of variables and functional forms. But everything comes at a cost:
one must always keep in mind that because they are tuned for \hat{Y}
they do not (without many other assumptions) give very useful
guarantees for \hat{\beta}.

Another relevant paper, again thanks to DJohnson: http://arxiv.org/pdf/1504.01132v3.pdf. This paper addresses the question I was struggling with above:

A … fundamental challenge to applying machine learning methods such
as regression trees off-the-shelf to the problem of causal inference
is that regularization approaches based on cross-validation typically
rely on observing the “ground truth,” that is, actual outcomes in a
cross-validation sample. However, if our goal is to minimize the
mean squared error of treatment effects, we encounter what [11] calls
the “fundamental problem of causal inference”: the causal effect is
not observed for any individual unit, and so we don’t directly have a
ground truth. We address this by proposing approaches for constructing
unbiased estimates of the mean-squared error of the causal effect of
the treatment.

Answer

Yes, when we want biased low variance estimations. I particularly like gung’s post here What problem do shrinkage methods solve? Please allow me to paste gung’s figure here…

enter image description here
If you check the plot gung made, you will be clear on why we need regularization / shrinkage. At first, I feel strange that why we need biased estimations? But looking at that figure, I realized, have a low variance model has a lot of advantages: for example, it is more “stable” in production use.

Attribution
Source : Link , Question Author : Adrian , Answer Author : Haitao Du

Leave a Comment