Can regularization be helpful if we are interested only in estimating (and interpreting) the model parameters, not in forecasting or prediction?

I see how regularization/cross-validation is extremely useful if your goal is to make good forecasts on new data. But what if you’re doing traditional economics and all you care about is estimating β? Can cross-validation also be useful in that context? The conceptual difficulty I struggle with is that we can actually compute L(Y,ˆY) on test data, but we can never compute L(β,ˆβ) because the true β is by definition never observed. (Take as given the assumption that there even is a true β, i.e. that we know the family of models from which the data were generated.)

Suppose your loss is L(β,ˆβ)=‖. You face a bias-variance tradeoff, right? So, in theory, you might be better off doing some regularization. But how can you possibly select your regularization parameter?

I’d be happy to see a simple numerical example of a linear regression model, with coefficients \beta \equiv (\beta_1, \beta_2, \ldots, \beta_k), where the researcher’s loss function is e.g. \lVert \beta – \hat{\beta} \rVert, or even just (\beta_1 – \hat{\beta}_1)^2. How, in practice, could one use cross-validation to improve expected loss in those examples?

Edit: DJohnson pointed me to https://www.cs.cornell.edu/home/kleinber/aer15-prediction.pdf, which is relevant to this question. The authors write thatMachine learning techniques … provide a disciplined way to predict

\hat{Y} which (i) uses the data itself to decide how to make the

bias-variance trade-off and (ii) allows for search over a very rich

set of variables and functional forms. But everything comes at a cost:

one must always keep in mind that because they are tuned for \hat{Y}

they do not (without many other assumptions) give very useful

guarantees for \hat{\beta}.Another relevant paper, again thanks to DJohnson: http://arxiv.org/pdf/1504.01132v3.pdf. This paper addresses the question I was struggling with above:

A … fundamental challenge to applying machine learning methods such

as regression trees off-the-shelf to the problem of causal inference

is that regularization approaches based on cross-validation typically

rely on observing the “ground truth,” that is, actual outcomes in a

cross-validation sample. However, if our goal is to minimize the

mean squared error of treatment effects, we encounter what [11] calls

the “fundamental problem of causal inference”: the causal effect is

not observed for any individual unit, and so we don’t directly have a

ground truth. We address this by proposing approaches for constructing

unbiased estimates of the mean-squared error of the causal effect of

the treatment.

**Answer**

**Yes, when we want biased low variance estimations.** I particularly like gung’s post here What problem do shrinkage methods solve? Please allow me to paste gung’s figure here…

If you check the plot gung made, you will be clear on why we need regularization / shrinkage. At first, I feel strange that why we need biased estimations? But looking at that figure, I realized, have a low variance model has a lot of advantages: for example, it is more “stable” in production use.

**Attribution***Source : Link , Question Author : Adrian , Answer Author : Haitao Du*