Jensen Shannon Divergence vs Kullback-Leibler Divergence?

I know that KL Divergence is not symmetric and it cannot be strictly considered as a metric. If so, why is it used when JS Divergence satisfies the required properties for a metric?

Are there scenarios where KL divergence can be used but not JS Divergence or vice-versa?

Answer

I found a very mature answer on the Quora and just put it here for people who look for it here:

The Kullback-Leibler divergence has a few nice properties, one of them
being that $𝐾𝐿[π‘ž;𝑝]$ kind of abhors regions where $π‘ž(π‘₯)$ have
non-null mass and $𝑝(π‘₯)$ has null mass. This might look like a bug,
but it’s actually a feature in certain situations.

If you’re trying to find approximations for a complex (intractable)
distribution $𝑝(π‘₯)$ by a (tractable) approximate distribution $π‘ž(π‘₯)$
you want to be absolutely sure that any π‘₯ that would be very
improbable to be drawn from $𝑝(π‘₯)$ would also be very improbable to be
drawn from $π‘ž(π‘₯)$. That KL have this property is easily shown: there’s
a $π‘ž(π‘₯)π‘™π‘œπ‘”[π‘ž(π‘₯)/𝑝(π‘₯)]$ in the integrand. When π‘ž(π‘₯) is small
but $𝑝(π‘₯)$ is not, that’s ok. But when $𝑝(π‘₯)$ is small, this grows very
rapidly if $π‘ž(π‘₯)$ isn’t also small. So, if you’re choosing $π‘ž(π‘₯)$ to
minimize $𝐾𝐿[π‘ž;𝑝]$, it’s very improbable that $π‘ž(π‘₯)$ will assign a
lot of mass on regions where $𝑝(π‘₯)$ is near zero.

The Jensen-Shannon divergence don’t have this property. It is well
behaved both when $𝑝(π‘₯)$ and $π‘ž(π‘₯)$ are small. This means that it won’t
penalize as much a distribution $π‘ž(π‘₯)$ from which you can sample
values that are impossible in $𝑝(π‘₯)$.

Attribution
Source : Link , Question Author : user2761431 , Answer Author : moh

Leave a Comment