Is there a graphical representation of bias-variance tradeoff in linear regression?

I am suffering from a blackout. I was presented the following picture to showcase the bias-variance tradeoff in the context of linear regression:

Polynomial model for data, simple and complex case

I can see that none of the two models is a good fit – the “simple” is not appreciating the complexity of the X-Y relation and the “complex” is just overfitting, basically learning the training data by heart. However I completely fail to see the bias and the variance in these two pictures. Could someone show this to me?

PS: The answer to Intuitive explanation of the bias-variance tradeoff? did not really help me, I would be glad if someone could provide a different approach based on the above picture.

Answer

The bias variance trade-off is based on the breakdown of the mean square error:

MSE(ˆy)=E[yˆy]2=E[yE[ˆy]]2+E[ˆyE[ˆy]]2

One way to see the bias-variance trade of is what properties of the data set are used in the model fit. For the simple model, if we assume that OLS regression was used to fit the straight line, then only 4 numbers are used to fit the line:

  1. The sample covariance between x and y
  2. The sample variance of x
  3. The sample mean of x
  4. The sample mean of y

So, any graph which leads to the same 4 numbers above will lead to exactly the same fitted line (10 points, 100 points, 100000000 points). So in a sense it is insensitive to the particular sample observed. This means it will be “biased” because it effectively ignores part of the data. If that ignored part of the data happened to be important, then the predictions will be consistently in error. You will see this if you compare the fitted line using all data to the fitted lines obtained from removing one data point. They will tend to be quite stable.

Now the second model uses every scrap of data it can get, and fits the data as close as possible. Hence, the exact position of every data point matters, and so you can’t shift the training data around without changing the fitted model like you can for OLS. Thus the model is very sensitive to the particular training set you have. The fitted model will be very different if you do the same drop-one data point plot.

Attribution
Source : Link , Question Author : blubb , Answer Author : probabilityislogic

Leave a Comment