Mathematical/Algorithmic definition for overfitting

Is there a mathematical or algorithmic definition of overfitting?

Often provided definitions are the classic 2-D plot of points with a line going through every single point and the validation loss curve suddenly going up.

But is there a mathematically rigorous definition?


Yes there is a (slightly more) rigorous definition:

Given a model with a set of parameters, the model can be said to be overfitting the data if after a certain number of training steps, the training error continues to decrease while the out of sample (test) error starts increasing.

enter image description here
In this example out of sample (test/validation) error first decreases in synch with the train error, then it starts increasing around the 90th epoch, that is when overfitting starts

Another way to look at it is in terms of bias and variance. The out of sample error for a model can be decomposed into two components:

  • Bias: Error due to the expected value from the estimated model being different from the expected value of the true model.
  • Variance: Error due to the model being sensitive to small fluctuations in the data set.

Overfitting occurs when the bias is low, but the variance is high.
For a data set $X$ where the true (unknown) model is:

$ Y = f(X) + \epsilon $$\epsilon$ being the irreducible noise in the data set, with $E(\epsilon)=0$ and $Var(\epsilon) = \sigma_{\epsilon}$,

and the estimated model is:

$ \hat{Y} = \hat{f}(X)$,

then the test error (for a test data point $x_t$) can be written as:

$Err(x_t) = \sigma_{\epsilon} + Bias^2 + Variance$

$Bias^2 = E[f(x_t)- \hat{f}(x_t)]^2$
$Variance = E[\hat{f}(x_t)- E[\hat{f}(x_t)]]^2$

(Strictly speaking this decomposition applies in the regression case, but a similar decomposition works for any loss function, i.e. in the classification case as well).

Both of the above definitions are tied to the model complexity (measured in terms of the numbers of parameters in the model): The higher the complexity of the model the more likely it is for overfitting to occur.

See chapter 7 of Elements of Statistical Learning for a rigorous mathematical treatment of the topic.

enter image description here
Bias-Variance tradeoff and Variance (i.e. overfitting) increasing with model complexity. Taken from ESL Chapter 7

Source : Link , Question Author : Brian Ko , Answer Author : Skander H.

Leave a Comment