Choosing to parameterize the gamma distribution Γ(b,c) by the pdf

g(x;b,c)=1Γ(c)xc−1bce−x/b

The Kullback-Leibler divergence between Γ(bq,cq) and Γ(bp,cp) is given by [1] asKLGa(bq,cq;bp,cp)=(cq−1)Ψ(cq)−logbq−cq−logΓ(cq)+logΓ(cp)+cplogbp−(cp−1)(Ψ(cq)+logbq)+bqcqbp

I’m guessing that Ψ(x):=Γ′(x)/Γ(x) is the

digamma function.This is given with no derivation. I cannot find any reference that does derive this. Any help? A good reference would be sufficient. The difficult part is integrating logx against a gamma pdf.

[1] W.D. Penny,

KL-Divergences of Normal, Gamma, Dirichlet, and Wishart densities, Available at: www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps

**Answer**

The KL divergence is a difference of integrals of the form

I(a,b,c,d)=∫∞0log(e−x/axb−1abΓ(b))e−x/cxd−1cdΓ(d)dx=−1a∫∞0xde−x/ccdΓ(d)dx−log(abΓ(b))∫∞0e−x/cxd−1cdΓ(d)dx+(b−1)∫∞0log(x)e−x/cxd−1cdΓ(d)dx=−cda−log(abΓ(b))+(b−1)∫∞0log(x)e−x/cxd−1cdΓ(d)dx

We just have to deal with the right hand integral, which is obtained by observing

∂∂dΓ(d)=∂∂d∫∞0e−x/cxd−1cddx=∂∂d∫∞0e−x/c(x/c)d−1cdx=∫∞0e−x/cxd−1cdlogxcdx=∫∞0log(x)e−x/cxd−1cddx−log(c)Γ(d).

Whence

b−1Γ(d)∫∞0log(x)e−x/c(x/c)d−1dx=(b−1)Γ′(d)Γ(d)+(b−1)log(c).

Plugging into the preceding yields

I(a,b,c,d)=−cda−log(abΓ(b))+(b−1)Γ′(d)Γ(d)+(b−1)log(c).

The KL divergence between Γ(c,d) and Γ(a,b) equals I(c,d,c,d)−I(a,b,c,d), which is straightforward to assemble.

### Implementation Details

Gamma functions grow rapidly, so to avoid overflow don’t compute Gamma and take its logarithm: instead use the log-Gamma function that will be found in any statistical computing platform (including Excel, for that matter).

The ratio Γ′(d)/Γ(d) is the logarithmic derivative of Γ, generally called ψ, the *digamma* function. If it’s not available to you, there are relatively simple ways to approximate it, as described in the Wikipedia article.

Here, to illustrate, is a direct `R`

implementation of the formula in terms of I. This does not exploit an opportunity to simplify the result algebraically, which would make it a little more efficient (by eliminating a redundant calculation of ψ).

```
#
# `b` and `d` are Gamma shape parameters and
# `a` and `c` are scale parameters.
# (All, therefore, must be positive.)
#
KL.gamma <- function(a,b,c,d) {
i <- function(a,b,c,d)
- c * d / a - b * log(a) - lgamma(b) + (b-1)*(psigamma(d) + log(c))
i(c,d,c,d) - i(a,b,c,d)
}
print(KL.gamma(1/114186.3, 202, 1/119237.3, 195), digits=12)
```

**Attribution***Source : Link , Question Author : Ian Langmore , Answer Author : whuber*