In regression, why not use regularization by default?

I remember reading somewhere in another post about the different viewpoints between people from statistics and from machine learning or neural networks, where one user was mentioning this idea as an example of bad practice.

Even then, I cannot find anyone asking this question, so I guess there is something evident I am missing. I can only think of two hypothetical scenarios where regularization would not be preferred:

  1. The researcher is interested in unbiasedness of the estimates.
  2. Due to a large volume of real-time data, one looks to minimize computation time.

In the former case, I am not convinced there is any practical reason for a researcher to look for unbiasedness over a lower error, specially considering a single study. In the latter, I am not even convinced there is a relevant gain in computation time.

What am I missing?


In short, regularization changes the distribution of the test statistic, rendering tests of hypothesis moot. In instances where we want to use regression to make inferences about interventions, we want unbiasedness.

Not everything to do with data is a prediction problem.

Source : Link , Question Author : Kuku , Answer Author : Demetri Pananos

Leave a Comment