I want to estimate the KL divergence between two continuous distributions $f$ and $g$. However, I can’t write down the density for either $f$ or $g$. I can sample from both $f$ and $g$ via some method (for example, Markov chain Monte Carlo).

The KL divergence from $f$ to $g$ is defined like this:

$$\operatorname{D_{\mathrm{KL}}}(f \parallel g) = \int_{-\infty}^{\infty} f(x) \log\left(\frac{f(x)}{g(x)}\right) \operatorname{d}x$$

This is the expectation of $\log\left(\frac{f(x)}{g(x)}\right)$ with respect to $f$, so you could imagine some Monte Carlo estimate

$$\frac{1}{N}\sum_{i=1}^N \log\left(\frac{f(x_i)}{g(x_i)}\right)$$

where $i$ indexes $N$ samples that are drawn from $f$ (i.e. $x_i \sim f()$ for $i = 1, \ldots, N$).

However, since I don’t know $f()$ and $g()$, I can’t even use this Monte Carlo estimate. What is the standard way of estimating the KL in this situation?

EDIT:

I do NOT know the unnormalized density for either $f()$ or $g()$.

**Answer**

I assume you can evaluate $f$ and $g$ up to a normalizing constant. Denote $f(x) = f_u(x)/c_f$ and $g(x) = g_u(x)/c_g$.

A consistent estimator that may be used is

$$

\widehat{D_{KL}}(f || g) = \left[n^{-1} \sum_j f_u(x_j)/\pi_f(x_j)\right]^{-1}\frac{1}{N}\sum_i^N \left[\log\left(\frac{f_u(z_i)}{g_u(z_i)}\right)\frac{f_u(z_i)}{\pi_r(z_i)}\right] – \log (\hat{r})

$$

where

$$

\hat{r} = \frac{1/n}{1/n}\frac{\sum_j f_u(x_j)/\pi_f(x_j)}{\sum_j g_u(y_j)/\pi_g(y_j)} \tag{1}.

$$

is an importance sampling estimator for the ratio $c_f/c_g$. Here you use $\pi_f$ and $\pi_g$ as instrumental densities for $f_u$ and $g_u$ respectively, and $\pi_r$ to target the log ratio of unnormalized densities.

So let $\{x_i\} \sim \pi_f$, $\{y_i\} \sim \pi_g$, and $\{z_i\} \sim \pi_r$. The numerator of (1) converges to $c_f$. The denominator converges to $c_g$. The ratio is consistent by the continuous mapping theorem. The log of the ratio is consistent by continuous mapping again.

Regarding the other part of the estimator,

$$

\frac{1}{N}\sum_i^N \left[\log\left(\frac{f_u(z_i)}{g_u(z_i)}\right)\frac{f_u(z_i)}{\pi_r(z_i)}\right] \overset{\text{as}}{\to} c_f E\left[ \log\left(\frac{f_u(z_i)}{g_u(z_i)}\right) \right]

$$

by the law of large numbers.

My motivation is the following:

\begin{align*}

D_{KL}(f || g) &= \int_{-\infty}^{\infty} f(x) \log\left(\frac{f(x)}{g(x)}\right) dx \\

&= \int_{-\infty}^{\infty} f(x)\left\{ \log \left[\frac{f_u(x)}{g_u(x)} \right] + \log \left[\frac{c_g}{c_f} \right]\right\} dx \\

&= E_f\left[\log \frac{f_u(x)}{g_u(x)} \right] + \log \left[\frac{c_g}{c_f} \right] \\

&= c_f^{-1} E_{\pi_r}\left[\log \frac{f_u(x)}{g_u(x)}\frac{f_u(x)}{\pi_r(x)} \right] + \log \left[\frac{c_g}{c_f} \right].

\end{align*}

So I just break it up into tractable pieces.

For more ideas on how to simulate the likelhood ratio, I found a paper that has a few:

https://projecteuclid.org/download/pdf_1/euclid.aos/1031594732

**Attribution***Source : Link , Question Author : frelk , Answer Author : Taylor*