Calculating R-squared (coefficient of determination) with centered vs. un-centered sums of squares

I am looking at some simple regression models using both R and the statsmodels package of Python. I’ve found that, when computing the coefficient of determination, statmodels uses the following formula for R^2:

R^2 = 1 – \frac{SSR}{TSS}~~~~~~(\text{centered})

where SSR is the sum of squared residuals, and TSS is the total sum of squares of the model. (“Centered” means that the mean has been removed from the series.) However, the same calculation in R yields a different result for R^2. The reason is that R seems to be calculating R^2 as:

R^2 = 1 – \frac{SSR}{TSS}~~~(\text{uncentered})

So, what gives? Presumably there’s some reason to prefer one over the other in certain situations. I haven’t been able to find any information online about the cases where one of the above formulae should be preferred.

Can someone please explain why one is better than the other?

Answer

As Stephane hinted in the comment, it is the difference between model with or without intercept that matters.

For what it is worth, here is code from one of my packages:

## cf src/library/stats/R/lm.R and case with no weights and an intercept
f <- object$fitted.values
r <- object$residuals
mss <- if (object$intercept) sum((f - mean(f))^2) else sum(f^2)
rss <- sum(r^2)
r.squared <- mss/(mss + rss)

Residuals are centered by design, leaves the fitted values with need to be centered in the intercept and not otherwise.

Attribution
Source : Link , Question Author : BenDundee , Answer Author : Dirk Eddelbuettel

Leave a Comment