How to calculate R-squared (r2) statistic in R for
For example for this data:
cars.lo <- loess(dist ~ speed, cars) cars.lp <- predict(cars.lo, data.frame(speed = seq(5, 30, 1)), se = TRUE)
cars.lphas two arrays
fitfor model and
se.fitfor standard error.
My first thought was to compute a pseudo R2 measure as follows:
ss.dist <- sum(scale(cars$dist, scale=FALSE)^2) ss.resid <- sum(resid(cars.lo)^2) 1-ss.resid/ss.dist
Here, we get a value of 0.6814984 (≈
cor(cars$dist, predict(cars.lo))^2), close to what would be obtained from a GAM:
library(mgcv) summary(gam(dist ~ speed, data=cars))
This also seems to be in agreement with what S
loess function would return (I don’t have S so I can’t check by myself) as
Multiple R-squared. For example, using the
airquality R dataset, which looks like the
air data Chambers and Hastie used in the ‘white book’ (the one that is being referenced in the on-line help for
loess; but that’s not the exact same dataset), I got an R2 of 0.8101377 using the above formula. That’s pretty in agreement with what Chambers and Hastie reported.
I should note that I didn’t find any paper dealing specifically with that (ok, that was just a quick googling), and William Cleveland doesn’t speak about R2-like measure in his paper.
However, I wonder if the liberty with which you can choose the degree of smoothing (or window
span) does not preclude any use of R2-based measure.