Question about bias-variance tradeoff

I’m trying to understand the bias-variance tradeoff, the relationship between the bias of the estimator and the bias of the model, and the relationship between the variance of the estimator and the variance of the model.

I came to these conclusions:

  • We tend to overfit the data when we neglect the bias of the estimator, that is when we only aim to minimize the bias of the model neglecting the variance of the model (in other words we only aim to minimize the variance of the estimator without considering the bias of the estimator too)
  • Vice versa, we tend to underfit the data when we neglect the variance of the estimator, that is when we only aim to minimize the variance of the model neglecting the bias of the model (in other words we only aim to minimize the bias of the estimator without considering the variance of the estimator too).

Are my conclusions correct?

Answer

Well, sort of. As stated, you ascribe intent to the scientist to minimize either bias or variance. In practice, you cannot explicitly observe the bias or the variance of your model (if you could, then you would know the true signal, in which case you wouldn’t need a model). In general, you can only observe the error rate of your model on a specific data set, and you seek to estimate the out of sample error rate using various creative techniques.

Now you do know that, theoretically at least, this error rate can be decomposed into bias and variance terms, but you cannot directly observe this balance in any specific concrete situation. So I’d restate your observations slightly as:

  • A model is underfit to the data when the bias term contributes the majority of out of sample error.
  • A model is overfit to the data when the variance term contributes the majority of out of sample error.

In general, there is no real way to know for sure, as you can never truly observe the model bias. Nonetheless, there are various patterns of behavior that are indicative of being in one situation or another:

  • Overfit models tend to have much worse goodness of fit performance on a testing dataset vs. a training data set.
  • Underfit models tend to have the similar goodness of fit performance on a testing vs. training data set.

These are the patterns that are manifest in the famous plots of error rates by model complexity, this one is from The Elements of Statistical Learning:

modelComplexity

Oftentimes these plots are overlaid with a bias and variance curve. I took this one from this nice exposition:

enter image description here

But, it is very important to realize that you never actually get to see these additional curves in any realistic situation.

Attribution
Source : Link , Question Author : John M , Answer Author : Matthew Drury

Leave a Comment