Clopper-Pearson for non mathematicians

I was wondering if anyone can explain to me the intuition beyond the Clopper-Pearson CI for proportions.

As far as I know, every CI includes a variance in it. However, for proportions, even if my proportion is 0 or 1 (0% or 100%), the Clopper-Pearson CI can be calculated. I tried looking at the formulas, and I understand it has something with percentiles of the Binomial distribution and I understand that finding the CI involves iterations, but I wondered if anyone can explain the logic and rational in “simple words”, or with minimum math ?

Answer

When you say you’re used to confidence intervals containing an expression for variance, you’re thinking of the Gaussian case, in which information about the two parameters characterizing the population—one its mean & the other its variance—is summarized by the sample mean & sample variance. The sample mean estimates the population mean, but the precision with which it does so depends on the population variance, estimated in turn by the sample variance. The binomial distribution, on the other hand, has just one parameter—the probability of success on each individual trial—& all the information given by the sample about this parameter is summarized in the total no. successes out of so many independent trials. The population variance and mean are both determined by this parameter.

You can get a Clopper–Pearson 95% (say) confidence interval for the parameter π working directly with the binomial probability mass function. Suppose you observe x successes out of n trials. The p.m.f. is

\Pr(X=x)= \binom{n}{x}\pi^x(1-\pi)^{n-x}

Increase \pi until the probability of x or fewer successes falls to 2.5%: that’s your upper bound. Decrease \pi until the probability of x or more successes falls to to 2.5%: that’s your lower bound. (I suggest you actually try doing this if it’s not clear from reading about it.) What you’re doing here is finding the values of \pi that when taken as a null hypothesis would lead to its (only just) being rejected by a two-tailed test at a significance level of 5%. In the long run, bounds calculated this way cover the true value of \pi, whatever it is, at least 95% of the time.

Attribution
Source : Link , Question Author : user40850 , Answer Author : Scortchi – Reinstate Monica

Leave a Comment