Conditional expectation of R-squared

Consider the simple linear model:

yy=Xββ+ϵ

where ϵii.i.d.N(0,σ2) and
XRn×p, p2 and X contains a column of
constants.

My question is, given E(XX), β and σ, is there a formula
for a non trivial upper bound on E(R2)*? (assuming the model was estimated by OLS).

*I assumed, writing this, that getting E(R2) itself would not be possible.

EDIT1

using the solution derived by Stéphane Laurent (see below) we can get a non trivial upper bound on E(R2). Some numerical simulations (below) show that this bound is
actually pretty tight.

Stéphane Laurent derived the following: R2B(p1,np,λ)
where B(p1,np,λ) is a non-central Beta distribution with
non-centrality parameter λ with

λ=||XβE(X)β1n||2σ2

So

E(R2)=E(χ2p1(λ)χ2p1(λ)+χ2np)E(χ2p1(λ))E(χ2p1(λ))+E(χ2np)

where χ2k(λ) is a non-central χ2 with parameter λ and k degrees of freedom. So a non-trivial upper bound for E(R2) is

λ+p1λ+n1

it is very tight (much tighter than what I had expected would be possible):

for example, using:

rho<-0.75
p<-10
n<-25*p
Su<-matrix(rho,p-1,p-1)
diag(Su)<-1
su<-1
set.seed(123)
bet<-runif(p)

the mean of the R2 over 1000 simulations is 0.960819. The theoretical upper bound above gives 0.9609081. The bound seems to be equally precise across many values of R2. Truly astounding!

EDIT2:

after further research, it appears that the quality of the upper bound approximation to E(R2) will get better as λ+p increases (and all else equal, λ increases with n).

Answer

Any linear model can be written Y=μ+σG where G has the standard normal distribution on Rn and μ is assumed to belong to a linear subspace W of Rn. In your case W=Im(X).

Let [1]W be the one-dimensional linear subspace generated by the vector (1,1,,1). Taking U=[1] below, the R2 is highly related to the classical Fisher statistic
F=
for the hypothesis test of H_0\colon\{\mu \in U\} where U\subset W is a linear subspace, and denoting by Z=U^\perp \cap W the orthogonal complement of U in W, and denoting m=\dim(W) and \ell=\dim(U) (then m=p and \ell=1 in your situation).

Indeed,

\dfrac{{\Vert P_Z Y\Vert}^2}{{\Vert P_W^\perp Y\Vert}^2}
= \frac{R^2}{1-R^2}

because the definition of R^2 is
R^2 = \frac{{\Vert P_Z Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}=1 – \frac{{\Vert P^\perp_W Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}.

Obviously \boxed{P_Z Y = P_Z \mu + \sigma P_Z G} and
\boxed{P_W^\perp Y = \sigma P_W^\perp G}.

When H_0\colon\{\mu \in U\} is true then P_Z \mu = 0 and therefore

F = \frac{{\Vert P_Z G\Vert}^2/(m-\ell)}{{\Vert P_W^\perp G\Vert}^2/(n-m)} \sim F_{m-\ell,n-m}

has the Fisher F_{m-\ell,n-m} distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, R^2 \sim {\cal B}(m-\ell, n-m).

In the general situation we have to deal with P_Z Y = P_Z \mu + \sigma P_Z G when P_Z\mu \neq 0. In this general case one has {\Vert P_Z Y\Vert}^2 \sim \sigma^2\chi^2_{m-\ell}(\lambda), the noncentral \chi^2 distribution with m-\ell degrees of freedom and noncentrality parameter \boxed{\lambda=\frac{{\Vert P_Z \mu\Vert}^2}{\sigma^2}}, and then
\boxed{F \sim F_{m-\ell,n-m}(\lambda)} (noncentral Fisher distribution). This is the classical result used to compute power of F-tests.

The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally R^2 has the noncentral beta distribution with “shape parameters” m-\ell and n-m and noncentrality parameter \lambda. I think the moments are available in the literature but they possibly are highly complicated.

Finally let us write down P_Z\mu. Note that P_Z = P_W – P_U. One has P_U \mu = \bar\mu 1 when U=[1], and P_W \mu = \mu. Hence P_Z \mu =\mu – \bar\mu 1 where here \mu=X\beta for the unknown parameters vector \beta.

Attribution
Source : Link , Question Author : user603 , Answer Author : Stéphane Laurent

Leave a Comment