# Estimate the Kullback–Leibler (KL) divergence with Monte Carlo

I want to estimate the KL divergence between two continuous distributions $$f$$ and $$g$$. However, I can’t write down the density for either $$f$$ or $$g$$. I can sample from both $$f$$ and $$g$$ via some method (for example, Markov chain Monte Carlo).

The KL divergence from $$f$$ to $$g$$ is defined like this:

$$\operatorname{D_{\mathrm{KL}}}(f \parallel g) = \int_{-\infty}^{\infty} f(x) \log\left(\frac{f(x)}{g(x)}\right) \operatorname{d}x$$

This is the expectation of $$\log\left(\frac{f(x)}{g(x)}\right)$$ with respect to $$f$$, so you could imagine some Monte Carlo estimate

$$\frac{1}{N}\sum_{i=1}^N \log\left(\frac{f(x_i)}{g(x_i)}\right)$$

where $$i$$ indexes $$N$$ samples that are drawn from $$f$$ (i.e. $$x_i \sim f()$$ for $$i = 1, \ldots, N$$).

However, since I don’t know $$f()$$ and $$g()$$, I can’t even use this Monte Carlo estimate. What is the standard way of estimating the KL in this situation?

EDIT:
I do NOT know the unnormalized density for either $$f()$$ or $$g()$$.

I assume you can evaluate $f$ and $g$ up to a normalizing constant. Denote $f(x) = f_u(x)/c_f$ and $g(x) = g_u(x)/c_g$.

A consistent estimator that may be used is
$$\widehat{D_{KL}}(f || g) = \left[n^{-1} \sum_j f_u(x_j)/\pi_f(x_j)\right]^{-1}\frac{1}{N}\sum_i^N \left[\log\left(\frac{f_u(z_i)}{g_u(z_i)}\right)\frac{f_u(z_i)}{\pi_r(z_i)}\right] – \log (\hat{r})$$
where
$$\hat{r} = \frac{1/n}{1/n}\frac{\sum_j f_u(x_j)/\pi_f(x_j)}{\sum_j g_u(y_j)/\pi_g(y_j)} \tag{1}.$$
is an importance sampling estimator for the ratio $c_f/c_g$. Here you use $\pi_f$ and $\pi_g$ as instrumental densities for $f_u$ and $g_u$ respectively, and $\pi_r$ to target the log ratio of unnormalized densities.

So let $\{x_i\} \sim \pi_f$, $\{y_i\} \sim \pi_g$, and $\{z_i\} \sim \pi_r$. The numerator of (1) converges to $c_f$. The denominator converges to $c_g$. The ratio is consistent by the continuous mapping theorem. The log of the ratio is consistent by continuous mapping again.

Regarding the other part of the estimator,
$$\frac{1}{N}\sum_i^N \left[\log\left(\frac{f_u(z_i)}{g_u(z_i)}\right)\frac{f_u(z_i)}{\pi_r(z_i)}\right] \overset{\text{as}}{\to} c_f E\left[ \log\left(\frac{f_u(z_i)}{g_u(z_i)}\right) \right]$$
by the law of large numbers.

My motivation is the following:

\begin{align*}
D_{KL}(f || g) &= \int_{-\infty}^{\infty} f(x) \log\left(\frac{f(x)}{g(x)}\right) dx \\
&= \int_{-\infty}^{\infty} f(x)\left\{ \log \left[\frac{f_u(x)}{g_u(x)} \right] + \log \left[\frac{c_g}{c_f} \right]\right\} dx \\
&= E_f\left[\log \frac{f_u(x)}{g_u(x)} \right] + \log \left[\frac{c_g}{c_f} \right] \\
&= c_f^{-1} E_{\pi_r}\left[\log \frac{f_u(x)}{g_u(x)}\frac{f_u(x)}{\pi_r(x)} \right] + \log \left[\frac{c_g}{c_f} \right].
\end{align*}
So I just break it up into tractable pieces.

For more ideas on how to simulate the likelhood ratio, I found a paper that has a few: