Estimating population size from the frequency of sampled duplicates and uniques

There is a web service where I can request information about a random item.
For every request each item has an equal chance of being returned.

I can keep requesting items and record the number of duplicates and unique. How can I use this data to estimate the total number of items?


This is essentially a variant of the coupon collector’s problem.

If there are $n$ items in total and you have taken a sample size $s$ with replacement then the probability of having identified $u$ unique items is
$$ Pr(U=u|n,s) = \frac{S_2(s,u) n! }{ (n-u)! n^s }$$
where $ S_2(s,u)$ gives Stirling numbers of the second kind

Now all you need is a prior distribution for $Pr(N=n)$, apply Bayes theorem, and get a posterior distribution for $N$.

Source : Link , Question Author : hoju , Answer Author : Henry

Leave a Comment