Kullback–Leibler divergence between two gamma distributions

Choosing to parameterize the gamma distribution Γ(b,c) by the pdf
g(x;b,c)=1Γ(c)xc1bcex/b
The Kullback-Leibler divergence between Γ(bq,cq) and Γ(bp,cp) is given by [1] as

KLGa(bq,cq;bp,cp)=(cq1)Ψ(cq)logbqcqlogΓ(cq)+logΓ(cp)+cplogbp(cp1)(Ψ(cq)+logbq)+bqcqbp

I’m guessing that Ψ(x):=Γ(x)/Γ(x) is the digamma function.

This is given with no derivation. I cannot find any reference that does derive this. Any help? A good reference would be sufficient. The difficult part is integrating logx against a gamma pdf.

[1] W.D. Penny, KL-Divergences of Normal, Gamma, Dirichlet, and Wishart densities, Available at: www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps

Answer

The KL divergence is a difference of integrals of the form

I(a,b,c,d)=0log(ex/axb1abΓ(b))ex/cxd1cdΓ(d)dx=1a0xdex/ccdΓ(d)dxlog(abΓ(b))0ex/cxd1cdΓ(d)dx+(b1)0log(x)ex/cxd1cdΓ(d)dx=cdalog(abΓ(b))+(b1)0log(x)ex/cxd1cdΓ(d)dx

We just have to deal with the right hand integral, which is obtained by observing

dΓ(d)=d0ex/cxd1cddx=d0ex/c(x/c)d1cdx=0ex/cxd1cdlogxcdx=0log(x)ex/cxd1cddxlog(c)Γ(d).

Whence

b1Γ(d)0log(x)ex/c(x/c)d1dx=(b1)Γ(d)Γ(d)+(b1)log(c).

Plugging into the preceding yields

I(a,b,c,d)=cdalog(abΓ(b))+(b1)Γ(d)Γ(d)+(b1)log(c).

The KL divergence between Γ(c,d) and Γ(a,b) equals I(c,d,c,d)I(a,b,c,d), which is straightforward to assemble.


Implementation Details

Gamma functions grow rapidly, so to avoid overflow don’t compute Gamma and take its logarithm: instead use the log-Gamma function that will be found in any statistical computing platform (including Excel, for that matter).

The ratio Γ(d)/Γ(d) is the logarithmic derivative of Γ, generally called ψ, the digamma function. If it’s not available to you, there are relatively simple ways to approximate it, as described in the Wikipedia article.

Here, to illustrate, is a direct R implementation of the formula in terms of I. This does not exploit an opportunity to simplify the result algebraically, which would make it a little more efficient (by eliminating a redundant calculation of ψ).

#
# `b` and `d` are Gamma shape parameters and
# `a` and `c` are scale parameters.
# (All, therefore, must be positive.)
#
KL.gamma <- function(a,b,c,d) {
  i <- function(a,b,c,d)
    - c * d / a - b * log(a) - lgamma(b) + (b-1)*(psigamma(d) + log(c))
  i(c,d,c,d) - i(a,b,c,d)
}
print(KL.gamma(1/114186.3, 202, 1/119237.3, 195), digits=12)

Attribution
Source : Link , Question Author : Ian Langmore , Answer Author : whuber

Leave a Comment