Is it possible to apply KL divergence between discrete and continuous distribution?

I am not a mathematician. I have searched the internet about KL Divergence. What I learned is the the KL divergence measures the information lost when we approximate distribution of a model with respect to the input distribution. I have seen these between any two continuous or discrete distributions. Can we do it between continuous and discrete or vice versa?

Answer

No: KL divergence is only defined on distributions over a common space. It asks about the probability density of a point x under two different distributions, p(x) and q(x). If p is a distribution on R3 and q a distribution on Z, then q(x) doesn’t make sense for points pR3 and p(z) doesn’t make sense for points zZ. In fact, we can’t even do it for two continuous distributions over different-dimensional spaces (or discrete, or any case where the underlying probability spaces don’t match).

If you have a particular case in mind, it may be possible to come up with some similar-spirited measure of dissimilarity between distributions. For example, it might make sense to encode a continuous distribution under a code for a discrete one (obviously with lost information), e.g. by rounding to the nearest point in the discrete case.

Attribution
Source : Link , Question Author : prakash , Answer Author : Danica

Leave a Comment