Conditional expectation of R-squared

Consider the simple linear model:

where $\epsilon_i\sim\mathrm{i.i.d.}\;\mathcal{N}(0,\sigma^2)$ and
$X\in\mathbb{R}^{n\times p}$, $p\geq2$ and $X$ contains a column of
constants.

My question is, given $\mathrm{E}(X'X)$, $\beta$ and $\sigma$, is there a formula
for a non trivial upper bound on $\mathrm{E}(R^2)$*? (assuming the model was estimated by OLS).

*I assumed, writing this, that getting $E(R^2)$ itself would not be possible.

EDIT1

using the solution derived by Stéphane Laurent (see below) we can get a non trivial upper bound on $E(R^2)$. Some numerical simulations (below) show that this bound is
actually pretty tight.

Stéphane Laurent derived the following: $R^2\sim\mathrm{B}(p-1,n-p,\lambda)$
where $\mathrm{B}(p-1,n-p,\lambda)$ is a non-central Beta distribution with
non-centrality parameter $\lambda$ with

So

where $\chi^2_{k}(\lambda)$ is a non-central $\chi^2$ with parameter $\lambda$ and $k$ degrees of freedom. So a non-trivial upper bound for $\mathrm{E}(R^2)$ is

it is very tight (much tighter than what I had expected would be possible):

for example, using:

rho<-0.75
p<-10
n<-25*p
Su<-matrix(rho,p-1,p-1)
diag(Su)<-1
su<-1
set.seed(123)
bet<-runif(p)


the mean of the $R^2$ over 1000 simulations is 0.960819. The theoretical upper bound above gives 0.9609081. The bound seems to be equally precise across many values of $R^2$. Truly astounding!

EDIT2:

after further research, it appears that the quality of the upper bound approximation to $E(R^2)$ will get better as $\lambda+p$ increases (and all else equal, $\lambda$ increases with $n$).

Any linear model can be written $\boxed{Y=\mu+\sigma G}$ where $G$ has the standard normal distribution on $\mathbb{R}^n$ and $\mu$ is assumed to belong to a linear subspace $W$ of $\mathbb{R}^n$. In your case $W=\text{Im}(X)$.

Let $[1] \subset W$ be the one-dimensional linear subspace generated by the vector $(1,1,\ldots,1)$. Taking $U=[1]$ below, the $R^2$ is highly related to the classical Fisher statistic

for the hypothesis test of $H_0\colon\{\mu \in U\}$ where $U\subset W$ is a linear subspace, and denoting by $Z=U^\perp \cap W$ the orthogonal complement of $U$ in $W$, and denoting $m=\dim(W)$ and $\ell=\dim(U)$ (then $m=p$ and $\ell=1$ in your situation).

Indeed,

because the definition of $R^2$ is

Obviously $\boxed{P_Z Y = P_Z \mu + \sigma P_Z G}$ and
$\boxed{P_W^\perp Y = \sigma P_W^\perp G}$.

When $H_0\colon\{\mu \in U\}$ is true then $P_Z \mu = 0$ and therefore

has the Fisher $F_{m-\ell,n-m}$ distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, $R^2 \sim {\cal B}(m-\ell, n-m)$.

In the general situation we have to deal with $P_Z Y = P_Z \mu + \sigma P_Z G$ when $P_Z\mu \neq 0$. In this general case one has ${\Vert P_Z Y\Vert}^2 \sim \sigma^2\chi^2_{m-\ell}(\lambda)$, the noncentral $\chi^2$ distribution with $m-\ell$ degrees of freedom and noncentrality parameter $\boxed{\lambda=\frac{{\Vert P_Z \mu\Vert}^2}{\sigma^2}}$, and then
$\boxed{F \sim F_{m-\ell,n-m}(\lambda)}$ (noncentral Fisher distribution). This is the classical result used to compute power of $F$-tests.

The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally $R^2$ has the noncentral beta distribution with “shape parameters” $m-\ell$ and $n-m$ and noncentrality parameter $\lambda$. I think the moments are available in the literature but they possibly are highly complicated.

Finally let us write down $P_Z\mu$. Note that $P_Z = P_W - P_U$. One has $P_U \mu = \bar\mu 1$ when $U=[1]$, and $P_W \mu = \mu$. Hence $P_Z \mu =\mu - \bar\mu 1$ where here $\mu=X\beta$ for the unknown parameters vector $\beta$.