Why is regression about variance?

I am reading this note.

On page 2, it states:

“How much of the variance in the data is explained by a given regression model?”

“Regression interpretation is about the mean of the coefficients; inference is about their variance.”

I have read about such statements numerous times, why would we care about “how much of the variance in the data is explained by the given regression model?”… more specifically, why “variance”?

Answer

why would we care about “how much of the variance in the data is explained by the given regression model?”

To answer this it is useful to think about exactly what it means for a certain percentage of the variance to be explained by the regression model.

Let Y1,...,Yn be the outcome variable. The usual sample variance of the dependent variable in a regression model is 1n1ni=1(Yi¯Y)2 Now let \widehat{Y}_i \equiv \widehat{f}({\boldsymbol X}_i) be the prediction of Y_i based on a least squares linear regression model with predictor values {\boldsymbol X}_i. As proven here, this variance above can be partitioned as:
\frac{1}{n-1} \sum_{i=1}^{n} (Y_i – \overline{Y})^2 =
\underbrace{\frac{1}{n-1} \sum_{i=1}^{n} (Y_i – \widehat{Y}_i)^2}_{{\rm residual \ variance}} + \underbrace{\frac{1}{n-1} \sum_{i=1}^{n} (\widehat{Y}_i – \overline{Y})^2}_{{\rm explained \ variance}}

In least squares regression, the average of the predicted values is \overline{Y}, therefore the total variance is equal to the averaged squared difference between the
observed and the predicted values (residual variance) plus the sample variance of the predictions themselves (explained variance), which are only a function of the {\boldsymbol X}s. Therefore the “explained” variance may be thought of as the variance in Y_i that is attributable to variation in {\boldsymbol X}_i. The proportion of the variance in Y_i that is “explained” (i.e. the proportion of variation in Y_i that is attributable to variation in {\boldsymbol X}_i) is sometimes referred to as R^2.

Now we use two extreme examples make it clear why this variance decomposition is important:

  • (1) The predictors have nothing to do with the responses. In that case, the best unbiased predictor (in the least squares sense) for Y_i is \widehat{Y}_i = \overline{Y}. Therefore the total variance in Y_i is just equal to the residual variance and is unrelated to the variance in the predictors {\boldsymbol X}_i.

  • (2) The predictors are perfectly linearly related to the predictors. In that case, the predictions are exactly correct and \widehat{Y}_i = Y_i. Therefore there is no residual variance and all of the variance in the outcome is the variance in the predictions themselves, which are only a function of the predictors. Therefore all of the variance in the outcome is simply due to variance in the predictors {\boldsymbol X}_i.

Situations with real data will often lie between the two extremes, as will the proportion of variance that can be attributed to these two sources. The more “explained variance” there is – i.e. the more of the variation in Y_i that is due to variation in {\boldsymbol X}_i – the better the predictions \widehat{Y}_{i} are performing (i.e. the smaller the “residual variance” is), which is another way of saying that the least squares model fits well.

Attribution
Source : Link , Question Author : Luna , Answer Author : Macro

Leave a Comment