# The pdf of \frac{X_1-\bar{X}}{S}\frac{X_1-\bar{X}}{S}

Suppose $$X_1, X_2,…,X_nX_1, X_2,...,X_n$$ be i.i.d from $$N(\mu,\sigma^2)N(\mu,\sigma^2)$$ with unknown $$\mu \in \mathcal R\mu \in \mathcal R$$ and $$\sigma^2>0\sigma^2>0$$

Let $$Z=\frac{X_1-\bar{X}}{S}Z=\frac{X_1-\bar{X}}{S}$$, where $$SS$$ is the standard deviation here.

It can be shown that $$ZZ$$ has the Lebesgue p.d.f.

$$f(z)=\frac{\sqrt{n}\, \Gamma\left(\frac{n-1}{2}\right)}{\sqrt{\pi}(n-1)\Gamma\left(\frac{n-2}{2}\right)}\left[1-\frac{nz^2}{(n-1)^2}\right]^{n/2-2}I_{(0,(n-1)/\sqrt{n})}(|z|)f(z)=\frac{\sqrt{n}\, \Gamma\left(\frac{n-1}{2}\right)}{\sqrt{\pi}(n-1)\Gamma\left(\frac{n-2}{2}\right)}\left[1-\frac{nz^2}{(n-1)^2}\right]^{n/2-2}I_{(0,(n-1)/\sqrt{n})}(|z|)$$

My question is then how to get this pdf?

The question is from here in example 3.3.4 to find the UMVUE of $$P(X_1 \le c)P(X_1 \le c)$$. I can understand the logic and procedures to find the UMVUE but don’t know how to get the pdf.

I think this question also relate to this one.

Thank you very much for help or point to any related references will be also appropriated.

What is so intriguing about this result is how much it looks like the distribution of a correlation coefficient. There’s a reason.

Suppose $$(X,Y)(X,Y)$$ is bivariate normal with zero correlation and common variance $$\sigma^2\sigma^2$$ for both variables. Draw an iid sample $$(x_1,y_1), \ldots, (x_n,y_n)(x_1,y_1), \ldots, (x_n,y_n)$$. It is well known, and readily established geometrically (as Fisher did a century ago) that the distribution of the sample correlation coefficient

$$r = \frac{\sum_{i=1}^n(x_i – \bar x)(y_i – \bar y)}{(n-1) S_x S_y}r = \frac{\sum_{i=1}^n(x_i - \bar x)(y_i - \bar y)}{(n-1) S_x S_y}$$

is

$$f(r) = \frac{1}{B\left(\frac{1}{2}, \frac{n}{2}-1\right)}\left(1-r^2\right)^{n/2-2},\ -1 \le r \le 1.f(r) = \frac{1}{B\left(\frac{1}{2}, \frac{n}{2}-1\right)}\left(1-r^2\right)^{n/2-2},\ -1 \le r \le 1.$$

(Here, as usual, $$\bar x\bar x$$ and $$\bar y\bar y$$ are sample means and $$S_xS_x$$ and $$S_yS_y$$ are the square roots of the unbiased variance estimators.) $$BB$$ is the Beta function, for which

$$\frac{1}{B\left(\frac{1}{2}, \frac{n}{2}-1\right)} = \frac{\Gamma\left(\frac{n-1}{2}\right)}{\Gamma\left(\frac{1}{2}\right)\Gamma\left(\frac{n}{2}-1\right)} = \frac{\Gamma\left(\frac{n-1}{2}\right)}{\sqrt{\pi}\Gamma\left(\frac{n}{2}-1\right)} . \tag{1}\frac{1}{B\left(\frac{1}{2}, \frac{n}{2}-1\right)} = \frac{\Gamma\left(\frac{n-1}{2}\right)}{\Gamma\left(\frac{1}{2}\right)\Gamma\left(\frac{n}{2}-1\right)} = \frac{\Gamma\left(\frac{n-1}{2}\right)}{\sqrt{\pi}\Gamma\left(\frac{n}{2}-1\right)} . \tag{1}$$

To compute $$rr$$, we may exploit its invariance under rotations in $$\mathbb{R}^n\mathbb{R}^n$$ around the line generated by $$(1,1,\ldots, 1)(1,1,\ldots, 1)$$, along with the invariance of the distribution of the sample under the same rotations, and choose $$y_i/S_yy_i/S_y$$ to be any unit vector whose components sum to zero. One such vector is proportional to $$v = (n-1, -1, \ldots, -1)v = (n-1, -1, \ldots, -1)$$. Its standard deviation is

$$S_v = \sqrt{\frac{1}{n-1}\left((n-1)^2 + (-1)^2 + \cdots + (-1)^2\right)} = \sqrt{n}.S_v = \sqrt{\frac{1}{n-1}\left((n-1)^2 + (-1)^2 + \cdots + (-1)^2\right)} = \sqrt{n}.$$

Consequently, $$rr$$ must have the same distribution as

$$\frac{\sum_{i=1}^n(x_i – \bar x)(v_i – \bar v)}{(n-1) S_x S_v} = \frac{(n-1)x_1 – x_2-\cdots-x_n}{(n-1) S_x \sqrt{n}} = \frac{n(x_1 – \bar x)}{(n-1) S_x \sqrt{n}} = \frac{\sqrt{n}}{n-1}Z.\frac{\sum_{i=1}^n(x_i - \bar x)(v_i - \bar v)}{(n-1) S_x S_v} = \frac{(n-1)x_1 - x_2-\cdots-x_n}{(n-1) S_x \sqrt{n}} = \frac{n(x_1 - \bar x)}{(n-1) S_x \sqrt{n}} = \frac{\sqrt{n}}{n-1}Z.$$

Therefore all we need to do is rescale $$rr$$ to find the distribution of $$ZZ$$:

$$f_Z(z) = \bigg|\frac{\sqrt{n}}{n-1}\bigg| f\left(\frac{\sqrt{n}}{n-1}z\right) = \frac{1}{B\left(\frac{1}{2}, \frac{n}{2}-1\right)} \frac{\sqrt{n}}{n-1}\left(1- \frac{n}{(n-1)^2}z^2\right)^{n/2-2}f_Z(z) = \bigg|\frac{\sqrt{n}}{n-1}\bigg| f\left(\frac{\sqrt{n}}{n-1}z\right) = \frac{1}{B\left(\frac{1}{2}, \frac{n}{2}-1\right)} \frac{\sqrt{n}}{n-1}\left(1- \frac{n}{(n-1)^2}z^2\right)^{n/2-2}$$

for $$|z| \le \frac{n-1}{\sqrt{n}}|z| \le \frac{n-1}{\sqrt{n}}$$. Formula (1) shows this is identical to that of the question.

Not entirely convinced? Here is the result of simulating this situation 100,000 times (with $$n=4n=4$$, where the distribution is uniform).

The first histogram plots the correlation coefficients of $$(x_i,y_i),i=1,\ldots,4(x_i,y_i),i=1,\ldots,4$$ while the second histogram plots the correlation coefficients of $$(x_i,v_i),i=1,\ldots,4)(x_i,v_i),i=1,\ldots,4)$$ for a randomly chosen vector $$v_iv_i$$ that remains fixed for all iterations. They are both uniform. The QQ-plot on the right confirms these distributions are essentially identical.

Here’s the R code that produced the plot.

n <- 4
n.sim <- 1e5
set.seed(17)
par(mfrow=c(1,3))
#
# Simulate spherical bivariate normal samples of size n each.
#
x <- matrix(rnorm(n.sim*n), n)
y <- matrix(rnorm(n.sim*n), n)
#
# Look at the distribution of the correlation of x and y.
#
sim <- sapply(1:n.sim, function(i) cor(x[,i], y[,i]))
hist(sim)
#
# Specify *any* fixed vector in place of y.
#
v <- c(n-1, rep(-1, n-1)) # The case in question
v <- rnorm(n)             # Can use anything you want
#
# Look at the distribution of the correlation of x with v.
#
sim2 <- sapply(1:n.sim, function(i) cor(x[,i], v))
hist(sim2)
#
# Compare the two distributions.
#
qqplot(sim, sim2, main="QQ Plot")


### Reference

R. A. Fisher, Frequency-distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10, 507. See Section 3. (Quoted in Kendall’s Advanced Theory of Statistics, 5th Ed., section 16.24.)