What does this plot tell me about my linear model?

I have fit the following linear model, I tested the response by looking at a qq plot and it is almost perfectly linear. When i fit the model though, and study the predicted vs observed plot, It looks like this:

enter image description here

What does this tell me exactly? It seems to me that a better fitting line would be gained if i pivoted the model slightly to match the slope of the points. I’m not sure what I can do to gain a better predictive model.

edit

i trained the linear model on my training set. The ‘predicted’ in the plot are the result of applying that model to an independent ‘test’ set, and comparing it wth the observed values for that test set. The line is found by abline(0,1)

Answer

This sort of “problem” occurs quite naturally and can look this way without actually indicating a problem. (There might be some problem, but a pattern similar to this doesn’t necessarily indicate one.)

It’s a consequence of regression to the mean and arises directly out of fitting the conditional mean (i.e. it’s exactly what you expect to see with regression).

One thing that might throw off some answerers is that you have your plot “backward” to what most of us are used to — with the random variable on the x axis rather than the y-axis.

Here I have generated some data according to a regression model (with a normally distributed predictor and conditionally normal response) and fitted a model of the same form as the one that generated the data. Here’s the corresponding plot to yours drawn the other way around:

enter image description here

Looking at the slice between the blue lines, the red line (which is just the line with slope 1 and intercept 0) passes very near the mean of the $y$ in that slice. That is, $\hat{y}\approx E(Y|x)$.

You are asking if you should “tweak” your line to lay closer to the major axis of the roughly elliptical point cloud … but that is not going to be the “best fit” line, and will tend to overpredict the mean for large $y$ values and underpredict it for small y values.

enter image description here

If the regression assumptions are reasonable, and assuming you actually want to predict $E(Y|x)$, then there’s nothing wrong here — you see exactly what you should.


A case where you might see something like this, and where it might be an issue:

However, if your line at the edge of the cloud doesn’t pass near the middle of small slices (vertical ones in my case) that might indicate that you have some underprediction (such as might occur if you’re shrinking coefficients).

That may or may not be a problem: shrinking coefficients toward zero is often quite useful; that will lead to bias but bias isn’t the whole story of fitting.

A small amount of bias toward zero (shrinkage) in the coefficients will produce a slightly “shallower” fit than the least squares line (on my plot; steeper on yours). That’s not necessarily a problem at all.

It’s only if the bias is larger than you want it to be that there would be any need to act at all. Otherwise it could still be doing exactly what it should.

So I don’t see a problem here — it looks to me like your model is doing what it should.

For reference, here’s the plot from the question flipped around:

Flipped plot from question of y versus predicted y

There’s some hint that it’s slightly biased toward 0 (which as mentioned, may not be a problem), and also perhaps a slight suggestion of a nonlinear relationship (which might potentially be a problem).

Attribution
Source : Link , Question Author : WeakLearner , Answer Author : Glen_b

Leave a Comment