Regression – How do I know if my residuals are normally distributed?

Performing a regression and need to find out if my residuals are normally distributed.


In practice you simply don’t know (but they probably aren’t). Not that non-normal residuals are necessarily a problem; it depends on how non-normal and how big your sample size is and how much you care about the impact on your inference.

You can see if the residuals are reasonably close to normal via a Q-Q plot.

A Q-Q plot isn’t hard to generate in Excel.

If you take r to be the ranks of the residuals (1 for smallest, 2 for second smallest, etc), then

Φ1(r3/8n+1/4) is a good approximation for the expected normal order statistics. Plot the residuals against that transformation of their ranks, and it should look roughly like a straight line.

(where Φ1 is the inverse cdf of a standard normal)

If you haven’t used Q-Q plots before, I’d suggest generating a bunch of sets of random normal data (at several samples sizes) and seeing what the plots look like. (Roughly like points close to a straight line with some tendency to be a bit more noisy – wiggle a bit – at the ends)

Then generate skewed data, heavy tailed data, uniform data, bimodal data etc and see what the plots look like when data isn’t normal. (Various kinds of curves and kinks, basically)

These plots are standard in most stats packages.

Here’s one done in R:

enter image description here

Here’s one I just generated in Excel via the above method:

enter image description here

(not the same set of data both times)

You can see the points form a straightish line … that’s because the data was actually normal.

Here’s one that’s not normal (it’s quite right skew):

enter image description here

If you ever happen to be using something that has neither Q-Q plots nor inverse normal cdf functions, proceed as above up to the ranking stage, then find p=r3/8n+1/4 but use the Tukey lambda approximation to the inverse normal cdf.

Actually, there are two such that have been in popular use:



(Either is quite adequate, but my recollection is that the second seemed to work slightly better. I believe Tukey used 1/0.1975 = 5.063 in the first one instead of 5.05)

Source : Link , Question Author : user44784 , Answer Author : Glen_b

Leave a Comment