# When is Fisher’s z-transform appropriate?

I want to test a sample correlation $r$ for significance, using p-values, that is

$H_0: \rho = 0, \; H_1: \rho \neq 0.$

I have understood that I can use Fisher’s z-transform to calculate this by

$z_{obs}= \displaystyle\frac{\sqrt{n-3}}{2}\ln\left(\displaystyle\frac{1+r}{1-r}\right)$

and finding the p-value by

$p = 2P\left(Z>z_{obs}\right)$

using the standard normal distribution.

My question is: how large $n$ should be for this to be an appropriate transformation? Obviously, $n$ must be larger than 3. My textbook does not mention any restrictions, but on slide 29 of this presentation it says that $n$ must be larger than 10. For the data I will be considering, I will have something like $5 \leq n \leq 10$.

For questions like these I would just run a simulation and see if the $p$-values behave as I expect them to. The $p$-value is the probability of randomly drawing a sample that deviates at least as much from the null-hypothesis as the data you observed if the null-hypothesis is true. So if we had many such samples, and one of them had a $p$-value of .04 then we would expect 4% of those samples to have a value less than .04. The same is true for all other possible $p$-values.

Below is a simulation in Stata. The graphs check whether the $p$-values measure what they are supposed to measure, that is, they shows how much the proportion of samples with $p$-values less than the nominal $p$-value deviates from the nominal $p$-value. As you can see that test is somewhat problematic with such small number of observations. Whether or not it is too problematic for your research is your judgement call.

clear all
set more off

program define sim, rclass
tempname z se
foreach i of numlist 5/10 20(10)50 {
drop _all
set obs i'
gen x = rnormal()
gen y = rnormal()
corr x y
scalar z'  = atanh(r(rho))
scalar se' = 1/sqrt(r(N)-3)
return scalar pi' = 2*normal(-abs(z'/se'))
}
end

simulate p5 =r(p5)  p6 =r(p6)  p7  =r(p7)     ///
p8 =r(p8)  p9 =r(p9)  p10 =r(p10)    ///
p20=r(p20) p30=r(p30) p40 =r(p40)    ///
p50=r(p50), reps(200000) nodots: sim

simpplot p5 p6 p7 p8 p9 p10, name(small, replace) ///
scheme(s2color) ylabel(,angle(horizontal))


simpplot p20 p30 p40 p50 , name(less_small, replace) ///
scheme(s2color) ylabel(,angle(horizontal))