Hypergeometric: how do I construct a credibility interval around K (population successes) in R?

I have a problem for which I believe I should use the hypergeometric distribution, but I can’t figure out how to do it in R.

Say I have a bag of marbles with known number (N) of marbles, but the number of successes (white marbles) in the bag (K) is unknown. I want to infer about K.

Given a sample from this population, where I see n trials and k successes, how do I infer something about the population K?

Ideally, I’d like to construct a prior distribution around K and then use the sample to update it and get a Bayesian posterior credibility interval around K (for a given credibility score α), but I am struggling to complete this practically. I have read that the conjugate prior for the hypergeometric is the beta-binomial. I thought maybe there would be an R function or package that could take the prior parameters, and then update with a sample to give me a credibility interval, but haven’t been able to find it.

If the Bayesian setting is difficult, perhaps a confidence interval would suffice… Can anyone either point me to some R functions or tutorials, or some resources that could help? Thanks.

EDIT: To add an example, I can do this in the case of the binomial distribution, to infer p, given a sample. I can construct a credibility interval with the binom package

k= 15
n= 25
binom.bayes(k, n, conf.level = .95, tol=.005, type="central")
# method  x  n shape1 shape2      mean     lower     upper  sig
#  bayes 15 25   15.5   10.5 0.5961538 0.4057793 0.7725105 0.05

To add a prior, because of the way updates work with the beta-binomial, I can just add counts to the k and n parameters according to the a and b parameters of the prior beta distribution.

In the beta-binomial example, N is infinite, and I’m inferring p. What I want to do is take this exact situation and just extend it to the case where N is finite (and known), and infer K (which would be equivalent to inferring p). This changes the binomial to the hypergeometric.


You can solve your problem using the Maximum Likelihood method rather than the Bayesian method. Just calculate the hypergeometric probability of finding k for a large number of values of K using your known values of k, N and n. The value of K that provides the maximum for this probability is the expected value of K.

Source : Link , Question Author : nsheff , Answer Author : Phil

Leave a Comment