Binomial random variable conditional on another one

On the Wikipedia page for the Binomial distribution, the following property is mentioned (under the related distribution section): (paraphrased)

If XBin(n,p) and Y|XBin(X,q), then YBin(n,pq)

I interpret this the following way. The probability mass function for X is:
P(X=x) = \binom{n}{x}p^x(1-p)^{n-x}
The conditional mass function for Y given X=x is:
P(Y=y|X=x) = \binom{x}{y}q^{y} (1-q)^{x-y}
The mass function of Y is:
P(Y=y) = \binom{n}{y} (pq)^y (1-pq)^{n-y}

There is no citation for this particular property. I have tried to prove it, but to no avail. I wrote the following R code to get a sense of the veracity of the claim.

# Observations of X & Y to be generated
obs <- 10000

n <- 10
p <- 0.6
q <- 0.4

X <- rbinom(obs, n, p)
Y <- X

for( i in 1:obs)
  Y[i] <- rbinom(1, X[i], q)

# Simulated pmf of Y 
hist(Y, breaks=obs)

# Theoretical/claimed pmf
Y_theoretical <- rbinom(obs, n, p*q)
hist(Y_theoretical, breaks=obs)

The two histograms generated are shown below:
(The simulated pmf)
Simulated distribution
(Claimed pmf)
Claimed distribution

Both seem identical for the choice of p and q.

Can a proof of this claim be provided?


Let X = \sum_{i=1}^{n} X_i, with X_i \overset{iid}{\sim} Bin(1, p), and Z = \sum_{i=1}^{n} Z_i, with Z_i \overset{iid}{\sim} Bin(1, q). If all the X_i and Z_i are mutually independent, then Z_i | X_i \overset{iid}{\sim} Bin(1, q).

Now to construct Y we want to throw out all the (X_i, Z_i) pairs where X_i=0 and then count the number of times Z_i=1 in the remaining pairs. That makes Y | X \sim Bin(x, q). We can also write Y = \sum_{i=1}^{n} Y_i with Y_i = X_i Z_i. We know X_i Z_i=1 if X_i=1 and Z_i=1, otherwise it is 0. Thus Y_i \overset{iid}{\sim} Bin(1, pq), and Y \sim Bin(n, pq).

Source : Link , Question Author : Comp_Warrior , Answer Author : P Schnell

Leave a Comment