Is it possible for R2R^2 of a regression on two variables be higher than the sum of R2R^2 for two regressions on the individual variables?

In OLS, is it possible for the R2 of a regression on two variables be higher than the sum of R2 for two regressions on the individual variables.

R2(YA+B)>R2(YA)+R2(YB)

Edit: Ugh, this is trivial; that’s what I get for trying to problems issues that I thought of while at the gym. Sorry for wasting time again. The answer is clearly yes.

YN(0,1)

AN(0,1)

B=YA

R2(YA+B)=1, clearly. But R2(YA) should be 0 in the limit and R2(YB) should be 0.5 in the limit.

Answer

Here’s a little bit of R that sets a random seed that will result in a dataset that shows it in action.

set.seed(103)

d <- data.frame(y=rnorm(20, 0, 1),
                a=rnorm(20, 0, 1),
                b=rnorm(20, 0, 1))

m1 <- lm(y~a, data=d)
m2 <- lm(y~b, data=d)
m3 <- lm(y~a+b, data=d)

r2.a <- summary(m1)[["r.squared"]]
r2.b <- summary(m2)[["r.squared"]]
r2.sum <- summary(m3)[["r.squared"]]

r2.sum > r2.a + r2.b

Not only is it possible (as you’ve already shown analytically) it’s not hard to do. Given 3 normally distributed variables, it seems to happen about 40% of the time.

Attribution
Source : Link , Question Author : bsdfish , Answer Author : Benjamin Mako Hill

Leave a Comment