Consider the simple linear model:

yy=X′ββ+ϵ

where ϵi∼i.i.d.N(0,σ2) and

X∈Rn×p, p≥2 and X contains a column of

constants.My question is, given E(X′X), β and σ, is there a formula

for a non trivial upper bound on E(R2)*? (assuming the model was estimated by OLS).*I assumed, writing this, that getting E(R2) itself would not be possible.

## EDIT1

using the solution derived by Stéphane Laurent (see below) we can get a non trivial upper bound on E(R2). Some numerical simulations (below) show that this bound is

actually pretty tight.Stéphane Laurent derived the following: R2∼B(p−1,n−p,λ)

where B(p−1,n−p,λ) is a non-central Beta distribution with

non-centrality parameter λ withλ=||X′β−E(X)′β1n||2σ2

So

E(R2)=E(χ2p−1(λ)χ2p−1(λ)+χ2n−p)≥E(χ2p−1(λ))E(χ2p−1(λ))+E(χ2n−p)

where χ2k(λ) is a non-central χ2 with parameter λ and k degrees of freedom. So a non-trivial upper bound for E(R2) is

λ+p−1λ+n−1

it is

verytight (much tighter than what I had expected would be possible):for example, using:

`rho<-0.75 p<-10 n<-25*p Su<-matrix(rho,p-1,p-1) diag(Su)<-1 su<-1 set.seed(123) bet<-runif(p)`

the mean of the R2 over 1000 simulations is

`0.960819`

. The theoretical upper bound above gives`0.9609081`

. The bound seems to be equally precise across many values of R2. Truly astounding!## EDIT2:

after further research, it appears that the quality of the upper bound approximation to E(R2) will get better as λ+p increases (and all else equal, λ increases with n).

**Answer**

Any linear model can be written Y=μ+σG where G has the standard normal distribution on Rn and μ is assumed to belong to a linear subspace W of Rn. In your case W=Im(X).

Let [1]⊂W be the one-dimensional linear subspace generated by the vector (1,1,…,1). Taking U=[1] below, the R2 is highly related to the classical Fisher statistic

F=‖

for the hypothesis test of H_0\colon\{\mu \in U\} where U\subset W is a linear subspace, and denoting by Z=U^\perp \cap W the orthogonal complement of U in W, and denoting m=\dim(W) and \ell=\dim(U) (then m=p and \ell=1 in your situation).

Indeed,

\dfrac{{\Vert P_Z Y\Vert}^2}{{\Vert P_W^\perp Y\Vert}^2}

= \frac{R^2}{1-R^2}

because the definition of R^2 is

R^2 = \frac{{\Vert P_Z Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}=1 – \frac{{\Vert P^\perp_W Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}.

Obviously \boxed{P_Z Y = P_Z \mu + \sigma P_Z G} and

\boxed{P_W^\perp Y = \sigma P_W^\perp G}.

**When H_0\colon\{\mu \in U\} is true** then P_Z \mu = 0 and therefore

F = \frac{{\Vert P_Z G\Vert}^2/(m-\ell)}{{\Vert P_W^\perp G\Vert}^2/(n-m)} \sim F_{m-\ell,n-m}

has the Fisher F_{m-\ell,n-m} distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, R^2 \sim {\cal B}(m-\ell, n-m).

**In the general situation** we have to deal with P_Z Y = P_Z \mu + \sigma P_Z G when P_Z\mu \neq 0. In this general case one has {\Vert P_Z Y\Vert}^2 \sim \sigma^2\chi^2_{m-\ell}(\lambda), the noncentral \chi^2 distribution with m-\ell degrees of freedom and noncentrality parameter \boxed{\lambda=\frac{{\Vert P_Z \mu\Vert}^2}{\sigma^2}}, and then

\boxed{F \sim F_{m-\ell,n-m}(\lambda)} (noncentral Fisher distribution). This is the classical result used to compute power of F-tests.

The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally R^2 has the noncentral beta distribution with “shape parameters” m-\ell and n-m and noncentrality parameter \lambda. I think the moments are available in the literature but they possibly are highly complicated.

Finally let us write down P_Z\mu. Note that P_Z = P_W – P_U. One has P_U \mu = \bar\mu 1 when U=[1], and P_W \mu = \mu. Hence P_Z \mu =\mu – \bar\mu 1 where here \mu=X\beta for the unknown parameters vector \beta.

**Attribution***Source : Link , Question Author : user603 , Answer Author : Stéphane Laurent*