Consider an urn containing $N$ balls of $P$ different colors, with $p_i$

being the proportion of balls of color $i$ among the $N$ balls ($\sum_i

p_i = 1$). I draw $n \leq N$ balls from the urnwithoutreplacement and look

at the number $\gamma$ of different colors among the balls that were drawn. What is the

expectation of $\gamma$ as a function of $n/N$, depending on suitable

properties of the distribution $\mathbf{p}$?To give more insight: if $N = P$ and $p_i = 1/P$ for all $i$, then I will always see

exactly $n$ colors, that is, $\gamma = P (n/N)$. Otherwise, it can be

shown that the expectation of $\gamma$ is $>P(n/N)$. For fixed $P$ and $N$, it

would seem that the factor by which to multiply $n/N$ would be maximal

when $\mathbf{p}$ is uniform; maybe the expected number of different

colors seen be bounded as a function of $n/N$ and, e.g., the entropy of

$\mathbf{p}$?This seems related to the coupon collector’s problem, except that

sampling is performed without replacement, and the distribution of the

coupons is not uniform.

**Answer**

Suppose you have $k$ colors where $k \leq N$. Let $b_i$ denote the number of balls color $i$ so $\sum b_i = N$. Let $B = \{b_1, \ldots, b_k\}$ and let $E_i(B)$ notate the set which consists of the $i$ element subsets of $B$. Let $Q_{n, c}$ denote the number of ways we can choose $n$ elements from the above set such that the number of different colors in the chosen set is $c$. For $c = 1$ the formula is simple:

$$

Q_{n, 1} = \sum_{E \in E_{1}(B)}\binom{\sum_{e \in E}e}{n}

$$

For $c = 2$ we can count sets of balls of size $n$ which has at most 2 colors minus the number of sets which have exactly $1$ color:

$$

Q_{n,2} = \sum_{E \in E_{2}(B)}\binom{\sum_{e \in E}e}{n} – \binom{k – 1}{1}Q_{n, 1}

$$

$\binom{k – 1}{1}$ is the number of ways you can add a color to a fixed color such that you will have 2 colors if you have $k$ colors in total. The generic formula is if you have $c_1$ fixed colors and you want to make $c_2$ colors out of it while having $k$ colors in total($c_1 \leq c_2 \leq k$) is $\binom{k – c_1}{c_2 – c_1}$. Now we have everything to derive the generic formula for $Q_{n, c}$:

$$

Q_{n, c} = \sum_{E \in E_{c}(B)}\binom{\sum_{e \in E}e}{n} – \sum_{i = 1}^{c – 1}\binom{k – i}{c – i}Q_{n, i}

$$

The probability that you will have exactly $c$ colors if you draw $n$ balls is:

$$

P_{n, c} = Q_{n, c} / \binom{N}{n}

$$

Also note that $\binom{x}{y} = 0$ if $y > x$.

Probably there are special cases where the formula can be simplified. I didn’t bother to find those simplifications this time.

The expected value you’re looking for the number of colors dependent on $n$ is the following:

$$

\gamma_{n} = \sum_{i = 1}^{k} P_{n, i} * i

$$

**Attribution***Source : Link , Question Author : a3nm , Answer Author : jakab922*