# Kullback–Leibler divergence between two gamma distributions

Choosing to parameterize the gamma distribution $\Gamma(b,c)$ by the pdf
$g(x;b,c) = \frac{1}{\Gamma(c)}\frac{x^{c-1}}{b^c}e^{-x/b}$
The Kullback-Leibler divergence between $\Gamma(b_q,c_q)$ and $\Gamma(b_p,c_p)$ is given by [1] as

I’m guessing that $\Psi(x):= \Gamma'(x)/\Gamma(x)$ is the digamma function.

This is given with no derivation. I cannot find any reference that does derive this. Any help? A good reference would be sufficient. The difficult part is integrating $\log x$ against a gamma pdf.

[1] W.D. Penny, KL-Divergences of Normal, Gamma, Dirichlet, and Wishart densities, Available at: www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps

The KL divergence is a difference of integrals of the form

I(a,b,c,d)=∫∞0log(e−x/axb−1abΓ(b))e−x/cxd−1cdΓ(d)dx=−1a∫∞0xde−x/ccdΓ(d)dx−log(abΓ(b))∫∞0e−x/cxd−1cdΓ(d)dx+(b−1)∫∞0log(x)e−x/cxd−1cdΓ(d)dx=−cda−log(abΓ(b))+(b−1)∫∞0log(x)e−x/cxd−1cdΓ(d)dx\begin{aligned} I(a,b,c,d)&=\int_0^{\infty} \log\left(\frac{e^{-x/a}x^{b-1}}{a^b\Gamma(b)}\right) \frac{e^{-x/c}x^{d-1}}{c^d \Gamma(d)}\, \mathrm dx \\ &=-\frac{1}{a}\int_0^\infty \frac{x^d e^{-x/c}}{c^d\Gamma(d)}\, \mathrm dx - \log(a^b\Gamma(b))\int_0^\infty \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\ &\quad+ (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\ &=-\frac{cd}{a} - \log(a^b\Gamma(b)) + (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\,\mathrm dx \end{aligned}

We just have to deal with the right hand integral, which is obtained by observing

∂∂dΓ(d)=∂∂d∫∞0e−x/cxd−1cddx=∂∂d∫∞0e−x/c(x/c)d−1cdx=∫∞0e−x/cxd−1cdlogxcdx=∫∞0log(x)e−x/cxd−1cddx−log(c)Γ(d).\eqalign{ \frac{\partial}{\partial d}\Gamma(d) =& \frac{\partial}{\partial d}\int_0^{\infty}e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx\\ =& \frac{\partial}{\partial d} \int_0^\infty e^{-x/c} \frac{(x/c)^{d-1}}{c}\, \mathrm dx\\ =&\int_0^\infty e^{-x/c}\frac{x^{d-1}}{c^d} \log\frac{x}{c} \, \mathrm dx\\ =&\int_0^{\infty}\log(x)e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx - \log(c)\Gamma(d). }

Whence

$$b−1Γ(d)∫∞0log(x)e−x/c(x/c)d−1dx=(b−1)Γ′(d)Γ(d)+(b−1)log(c).\frac{b-1}{\Gamma(d)}\int_0^{\infty} \log(x)e^{-x/c}(x/c)^{d-1}\, \mathrm dx = (b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$

Plugging into the preceding yields

$$I(a,b,c,d)=−cda−log(abΓ(b))+(b−1)Γ′(d)Γ(d)+(b−1)log(c).I(a,b,c,d)=\frac{-cd}{a} -\log(a^b\Gamma(b))+(b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$

The KL divergence between $$Γ(c,d)\Gamma(c,d)$$ and $$Γ(a,b)\Gamma(a,b)$$ equals $$I(c,d,c,d)−I(a,b,c,d)I(c,d,c,d) - I(a,b,c,d)$$, which is straightforward to assemble.

### Implementation Details

Gamma functions grow rapidly, so to avoid overflow don’t compute Gamma and take its logarithm: instead use the log-Gamma function that will be found in any statistical computing platform (including Excel, for that matter).

The ratio $$Γ′(d)/Γ(d)\Gamma^\prime(d)/\Gamma(d)$$ is the logarithmic derivative of $$Γ,\Gamma,$$ generally called $$ψ,\psi,$$ the digamma function. If it’s not available to you, there are relatively simple ways to approximate it, as described in the Wikipedia article.

Here, to illustrate, is a direct R implementation of the formula in terms of $$II$$. This does not exploit an opportunity to simplify the result algebraically, which would make it a little more efficient (by eliminating a redundant calculation of $$ψ\psi$$).

#
# b and d are Gamma shape parameters and
# a and c are scale parameters.
# (All, therefore, must be positive.)
#
KL.gamma <- function(a,b,c,d) {
i <- function(a,b,c,d)
- c * d / a - b * log(a) - lgamma(b) + (b-1)*(psigamma(d) + log(c))
i(c,d,c,d) - i(a,b,c,d)
}
print(KL.gamma(1/114186.3, 202, 1/119237.3, 195), digits=12)