# Is the sample correlation coefficient an unbiased estimator of the population correlation coefficient?

Is it true that $R_{X,Y}$ is an unbiased estimator for $\rho_{X,Y}$? That is, $$\mathbf{E}\left[R_{X,Y}\right]=\rho_{X,Y}?$$

If not, what is an unbiased estimator for $\rho_{X,Y}$? (Perhaps there is a standard unbiased estimator that’s used? Also, is it analogous to the unbiased sample variance, where we simply make the simple adjustment of multiplying the biased sample variance by $\frac{n}{n-1}$?)



The population correlation coefficient is defined as $$\rho_{X,Y}=\frac{\mathbf{E}\left[\left(X-\mu_{X}\right)\left(Y-\mu_{Y}\right)\right]}{\sqrt{\mathbf{E}\left[\left(X-\mu_{X}\right)^{2}\right]}\sqrt{\mathbf{E}\left[\left(Y-\mu_{Y}\right)^{2}\right]}},$$ while the sample correlation coefficient is defined as $$R_{X,Y}=\frac{\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)\left(Y_{i}-\bar{Y}\right)}{\sqrt{\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}}\sqrt{\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}\right)^{2}}}.$$

$$\mathbb{E} \widehat{\rho} = \rho \left[1 – \frac{\left(1-\rho^2 \right)}{2n} + O\left( \frac{1}{n^2} \right) \right]$$
as seen in Chapter 2 of Lehmann’s Theory of Point Estimation. There are infinitely many terms in the expression above but we are essentially considering terms of equal or lower order than $n^{-2}$ negligible.
This formula shows that the sample correlation coefficient is only unbiased for $\rho = 0$, i.e. independence, as one would expect. It is also unbiased for the degenerate cases with $|\rho| = 1$, but that is not very interesting. In general cases the bias will be of order $\frac{1}{n}$ but quite small for all reasonable sample sizes.
In Normal distributions the sample correlation coefficient is the mle, which means that it is asymptotically unbiased. You can also see that from the above formula as $\mathbb{E} \widehat{\rho} \to \rho$. Note that this already follows from the boundedness and the consistency of the sample correlation coefficient through the bounded convergence theorem.