Why doesn’t correlation of residuals matter when testing for normality?

When Y=AX+ε (i.e., Y comes from linear regression model),
εN(0,σ2I)ˆe=(IH)YN(0,(IH)σ2)
and in that case residuals ˆe1,,ˆen are correlated and not independent. But when we do regression diagnostics and want to test the assumption
εN(0,σ2I), every textbook suggests to use
Q–Q plots and statistical tests on residuals ˆe that were designed to
test whether ˆeN(0,σ2I) for some σ2R.

How come it doesn’t matter for these tests that residuals are correlated, and
not independent? It is often suggested to use standardised residuals:
ˆei=ˆei1hii,
but that only makes them homoscedastic, not independent.

To rephrase the question: Residuals from OLS regression are correlated. I understand that in practice, these correlations are so small (most of the time? always?), they can be ignored when testing whether residuals came from normal distribution. My question is, why?

Answer

In your notation, H is the projection an the column space of X, i.e. the subspace spanned of all regressors. Therefore M:=InH is the projection on everything orthogonal to the subspace spanned by all regressors.

If XRn×k, then ˆeRn is singular normal distributed and the elements are correlated, as you state.

The errors ε are unobservable and are in general not orthogonal to the subspace spanned by X.
For the sake of argument, assume that the error εspan(X).
If this was true, we would have y=Xβ+ε=˜y+ε with ˜yε. Since ˜y=Xβspan(X), we could decompose y and get the true ε.

Assume we have a basis b1,,bn of Rn, where the first b1,,bk basis vector span the subspace span(X) and the remaining bk+1,,bn span span(X).
In general, the error ε=α1b1++αnbn will have non-zero components αi for i{1,,k}. This non-zero components will get mixed up with Xβ and therefore can not be recovered by projection on span(X).

Since we can never hope to recover the true errors ε and ˆe are correlated singular n-dimensional normal, we could transform ˆeRneRnk. There we can have that
eNnk(0,σ2Ink),
i.e. e is non-singular uncorrelated and homoscedastic normal distributed. The residuals e are called Theil’s BLUS residuals.

In the short paper On the Testing of Regression Disturbances for Normality you find a comparison of OLS and BLUS residuals. In the tested Monte Carlo setting the OLS residuals are superior to BLUS residuals. But this should give you some starting point.

Attribution
Source : Link , Question Author : Zoran Loncarevic , Answer Author : Marco Breitig

Leave a Comment