# Jensen Shannon Divergence vs Kullback-Leibler Divergence?

I know that KL Divergence is not symmetric and it cannot be strictly considered as a metric. If so, why is it used when JS Divergence satisfies the required properties for a metric?

Are there scenarios where KL divergence can be used but not JS Divergence or vice-versa?

I found a very mature answer on the Quora and just put it here for people who look for it here:

The Kullback-Leibler divergence has a few nice properties, one of them
being that $$πΎπΏ[π;π]$$ kind of abhors regions where $$π(π₯)$$ have
non-null mass and $$π(π₯)$$ has null mass. This might look like a bug,
but itβs actually a feature in certain situations.

If youβre trying to find approximations for a complex (intractable)
distribution $$π(π₯)$$ by a (tractable) approximate distribution $$π(π₯)$$
you want to be absolutely sure that any π₯ that would be very
improbable to be drawn from $$π(π₯)$$ would also be very improbable to be
drawn from $$π(π₯)$$. That KL have this property is easily shown: thereβs
a $$π(π₯)πππ[π(π₯)/π(π₯)]$$ in the integrand. When π(π₯) is small
but $$π(π₯)$$ is not, thatβs ok. But when $$π(π₯)$$ is small, this grows very
rapidly if $$π(π₯)$$ isnβt also small. So, if youβre choosing $$π(π₯)$$ to
minimize $$πΎπΏ[π;π]$$, itβs very improbable that $$π(π₯)$$ will assign a
lot of mass on regions where $$π(π₯)$$ is near zero.

The Jensen-Shannon divergence donβt have this property. It is well
behaved both when $$π(π₯)$$ and $$π(π₯)$$ are small. This means that it wonβt
penalize as much a distribution $$π(π₯)$$ from which you can sample
values that are impossible in $$π(π₯)$$.