I can’t seem to find a written out derivation for the marginal probability function of the compound Dirichlet-Multinomial distribution, though the mean and variance/covariance of the margins seem to be well known

For clarity, what I am looking for is, given the joint pdf of the Dirichlet-multinomial over discrete category counts xk:

P(X1=x1,...,Xd=xd)=N!x1!x2!...xd!Γ(A)Γ(N+A)∏dk=1Γ(xk+αk)Γ(αk),

where A=∑dk=1αk,the parameters of the Dirichlet distribution and N=∑dk=1xk the total multinomial sample size.

What is the marginal probability P(Xk=xk)? That the xk‘s are included in a sum within the gamma function is throwing me for a bit of a loop.

**Answer**

I think I have a proof, but you’re probably not going to like it… At least I don’t like it. If you want to skip to the punchline, it’s equation (∗∗∗) below.

I claim that it suffices to show this aggregation/marginalization property for three variables, and that the general case should follow by induction.

So given

P(X1=x1,X2=x2,X3=x3)=N!x1!x2!x3!Γ(A)Γ(N+A)Γ(x1+α1)Γ(α1)Γ(x2+α2)Γ(α2)Γ(x3+α3)Γ(α3),

the claim is that

P(X1=x1)=P(X1=x1,(X2+X3)=N−x1)=N!x1!(N−x1)!Γ(A)Γ(N+A)Γ(x1+α1)Γ(α1)Γ((N−x1)+(A−α1))Γ(A−α1)

i.e. we can reduce things to a Beta-Binomial distribution.

Note that

P(X1=x1)=P(X1=x1,(X2+X3)=N−x1)=∑x2+x3=N−x1P(X1=x1,X2=x2,X3=x3)

So really what I am claiming is that

∑x2+x3=N−x1N!x1!x2!x3!Γ(A)Γ(N+A)Γ(x1+α1)Γ(α1)Γ(x2+α2)Γ(α2)Γ(x3+α3)Γ(α3)=N!x1!(N−x1)!Γ(A)Γ(N+A)Γ(x1+α1)Γ(α1)Γ((N−x1)+(A−α1))Γ(A−α1)

Cancelling factors on both sides, what I am *really* really claiming is that

∑x2+x3=N−x11x2!x3!Γ(x2+α2)Γ(α2)Γ(x3+α3)Γ(α3)=1(N−x1)!Γ((N−x1)+(A−α1))Γ(A−α1)

or tidying up even further

∑x2+x3=N−x11x2!x3!Γ(x2+α2)Γ(α2)Γ(x3+α3)Γ(α3)=1(N−x1)!Γ((N−x1)+(α2+α3))Γ(α2+α3)

Basically everything that follows from here will amount to renaming variables and appeals to obscure combinatorial identities (which at the very least should be proved in some textbook somewhere). So this is why you probably won’t like the proof. On the other hand, no integrals nor integration by parts is (directly) involved. So there’s that.

*Anyway*, let’s rename N−x1=:m, x2=:m1, x3=:m2, and so m1+m2=x2+x3=N−x1=m, in other words m1+m2=m.

Recall that m1, m2, and m are all *non-negative integers*.

Similarly, let’s rename A−α1=α2+α3=:c and α2=:c1 and α3=:c2. In particular we have c1+c2=c by definition. Recall that c1,c2, and c are all positive real numbers.

OK great, so that means what we want to show then is equivalent (up to renaming) to the identity

∑m1+m2=m1m1!m2!Γ(m1+c1)Γ(c1)Γ(m2+c2)Γ(c2)=1m!Γ(m+c)Γ(c)

Using the identity Γ(y+1)=yΓ(y) for y a positive real number, and then induction, we get that for any positive integer n that Γ(y+n)=Γ(y)⋅∏n−1i=0(y+i).

In particular, we get that

Γ(y+n)Γ(y)=n−1∏i=0(y+i)=:y(i),

where y(i) denotes the *rising factorial*, which is sometimes also denoted using the Pochhammer symbol (y)i, but sometimes the Pochhammer symbol denotes the falling factorial or the regular factorial instead, so let’s stick with y(i).

Therefore the identity we want to show is equivalent to

∑m1+m2=m1m1!m2!c(m1)1c(m2)2=1m!c(m),

where recall that c1+c2=c, all positive reals, and m1+m2=m, all non-negative integers. (Note that when i=0, the rising factorial y(i)=y(0) is equal to the empty product i.e. 1.)

Anyway, there’s no harm in multiplying both sides of the above identity by m!, which leads to

∑m1+m2=mm!m1!m2!c(m1)1c(m2)2=c(m),

By definition of binomial coefficient and re-indexing we clearly have that

\sum_{m_1 + m_2 = m} \frac{m!}{m_1!m_2!} c_1^{(m_1)} c_2^{(m_2)} = \sum_{m_1 = 0}^m \binom{m}{m_1} c_1^{(m_1)} c_2^{(m-m_1)} \,,

whereas meanwhile by definition c = c_1 + c_2, so c^{(m)} = (c_1 + c_2)^{(m)}, so the identity we want to show/be true is equivalent to the identity

\sum_{m_1 = 0}^m \binom{m}{m_1} c_1^{(m_1)} c_2^{(m-m_1)} = (c_1 + c_2)^{(m)} \,. \tag{***}

*Apparently* (according to both Wikipedia and Wolfram mathworld) this result is true, an equivalent formulation of the “Chu-Vandermonde identity”, and related to “umbral calculus“.

So if you’re willing to believe that, or able to look up and follow the proofs supposedly given in the references mentioned by Wolfram Mathworld and Wikipedia:

- Koepf, W. Hypergeometric Summation: An Algorithmic Approach to Summation and Special Function Identities. Braunschweig, Germany: Vieweg, 1998, p. 42
- Boros, G. and Moll, V. Irresistible Integrals: Symbolics, Analysis and Experiments in the Evaluation of Integrals. Cambridge, England: Cambridge University Press, 2004, p. 18
- Askey, Richard (1975), Orthogonal polynomials and special functions,

Regional Conference Series in Applied Mathematics, 21, Philadelphia,

PA: SIAM, p. 59–60

then based on what I showed above, it should follow using induction that the “aggregation property” of the Dirichlet-Multinomial distribution (equivalent to the marginalizationt you asked for) is true.

**Attribution***Source : Link , Question Author : zzk , Answer Author : Chill2Macht*