Example of how the log-sum-exp trick works in Naive Bayes

I have read about the log-sum-exp trick in many places (e.g. here, and here) but have never seen an example of how it is applied specifically to the Naive Bayes classifier (e.g. with discrete features and two classes)

How exactly would one avoid the problem of numerical underflow using this trick?

Answer

In
p(Y=C|x)=p(x|Y=C)p(Y=C) |C|k=1p(x|Y=Ck)p(Y=Ck)

both the denominator and the numerator can become very small, typically because the p(xi|Ck) can be close to 0 and we multiply many of them with each other. To prevent underflows, one can simply take the log of the numerator, but one needs to use the log-sum-exp trick for the denominator.


More specifically, in order to prevent underflows:

  • If we only care about knowing which class (ˆy) the input (x=x1,,xn) most likely belongs to with the maximum a posteriori (MAP) decision rule, we don’t have to apply the log-sum-exp trick, since we don’t have to compute the denominator in that case. For the numerator one can simply take the log to prevent underflows: log(p(x|Y=C)p(Y=C)). More specifically:

    ˆy=argmaxk{1,,|C|}p(Ck|x1,,xn)=argmaxk{1,,|C|} p(Ck)ni=1p(xi|Ck)

    which becomes after taking the log:

ˆy=argmaxk{1,,|C|}log(p(Ck|x1,,xn))=argmaxk{1,,|C|}log( p(Ck)ni=1p(xi|Ck))=argmaxk{1,,|C|}(log(p(Ck))+ ni=1log(p(xi|Ck)))

  • If we want to compute the class probability p(Y=C|x), we will need to compute the denominator:

    log(p(Y=C|x))=log(p(x|Y=C)p(Y=C) |C|k=1p(x|Y=Ck)p(Y=Ck))=log(p(x|Y=C)p(Y=C)numerator)log( |C|k=1p(x|Y=Ck)p(Y=Ck)denominator)

    The element log( |C|k=1p(x|Y=Ck)p(Y=Ck)) may underflow because p(xi|Ck) can be very small: it is the same issue as in the numerator, but this time we have a summation inside the logarithm, which prevents us from transforming the p(xi|Ck) (can be close to 0) into log(p(xi|Ck)) (negative and not close to 0 anymore, since 0p(xi|Ck)1). To circumvent this issue, we can use the fact that p(xi|Ck)=exp(log(p(xi|Ck))) to obtain:

    log( |C|k=1p(x|Y=Ck)p(Y=Ck))=log( |C|k=1exp(log(p(x|Y=Ck)p(Y=Ck))))

    At that point, a new issue arises: log(p(x|Y=Ck)p(Y=Ck)) may be quite negative, which implies that exp(log(p(x|Y=Ck)p(Y=Ck))) may become very close to 0, i.e. underflow. This is where we use the log-sum-exp trick:

    logkeak=logkeakeAA=A+logkeakA

    with:

    • ak=log(p(x|Y=Ck)p(Y=Ck)),
    • A=max

    We can see that introducing the variable A avoids underflows. E.g. with k=2, a_1 = – 245, a_2 = – 255, we have:

    • \exp \left(a_1\right) = \exp \left(- 245\right) =3.96143\times 10^{- 107}
    • \exp \left(a_2\right) = \exp \left(- 255\right) =1.798486 \times 10^{-111}

    Using the log-sum-exp trick we avoid the underflow, with A=\max ( -245, -255 )=-245:
    \begin{align}\log \sum_k e^{a_k} &= \log \sum_k e^{a_k}e^{A-A} \\&= A+ \log\sum_k e^{a_k -A}\\ &= -245+ \log\sum_k e^{a_k +245}\\&= -245+ \log \left(e^{-245 +245}+e^{-255 +245}\right) \\&=-245+ \log \left(e^{0}+e^{-10}\right) \end{align}

    We avoided the underflow since e^{-10} is much farther away from 0 than 3.96143\times 10^{- 107} or 1.798486 \times 10^{-111}.

Attribution
Source : Link , Question Author : Josh , Answer Author : Community

Leave a Comment