# Jensen Shannon Divergence vs Kullback-Leibler Divergence?

I know that KL Divergence is not symmetric and it cannot be strictly considered as a metric. If so, why is it used when JS Divergence satisfies the required properties for a metric?

Are there scenarios where KL divergence can be used but not JS Divergence or vice-versa?

I found a very mature answer on the Quora and just put it here for people who look for it here:

The Kullback-Leibler divergence has a few nice properties, one of them
being that $$𝐾𝐿[𝑞;𝑝]$$ kind of abhors regions where $$𝑞(𝑥)$$ have
non-null mass and $$𝑝(𝑥)$$ has null mass. This might look like a bug,
but it’s actually a feature in certain situations.

If you’re trying to find approximations for a complex (intractable)
distribution $$𝑝(𝑥)$$ by a (tractable) approximate distribution $$𝑞(𝑥)$$
you want to be absolutely sure that any 𝑥 that would be very
improbable to be drawn from $$𝑝(𝑥)$$ would also be very improbable to be
drawn from $$𝑞(𝑥)$$. That KL have this property is easily shown: there’s
a $$𝑞(𝑥)𝑙𝑜𝑔[𝑞(𝑥)/𝑝(𝑥)]$$ in the integrand. When 𝑞(𝑥) is small
but $$𝑝(𝑥)$$ is not, that’s ok. But when $$𝑝(𝑥)$$ is small, this grows very
rapidly if $$𝑞(𝑥)$$ isn’t also small. So, if you’re choosing $$𝑞(𝑥)$$ to
minimize $$𝐾𝐿[𝑞;𝑝]$$, it’s very improbable that $$𝑞(𝑥)$$ will assign a
lot of mass on regions where $$𝑝(𝑥)$$ is near zero.

The Jensen-Shannon divergence don’t have this property. It is well
behaved both when $$𝑝(𝑥)$$ and $$𝑞(𝑥)$$ are small. This means that it won’t
penalize as much a distribution $$𝑞(𝑥)$$ from which you can sample
values that are impossible in $$𝑝(𝑥)$$.