Answer
In
p(Y=Cx)=p(xY=C)p(Y=C) ∑Ck=1p(xY=Ck)p(Y=Ck)
both the denominator and the numerator can become very small, typically because the p(xiCk) can be close to 0 and we multiply many of them with each other. To prevent underflows, one can simply take the log of the numerator, but one needs to use the logsumexp trick for the denominator.
More specifically, in order to prevent underflows:

If we only care about knowing which class (ˆy) the input (x=x1,…,xn) most likely belongs to with the maximum a posteriori (MAP) decision rule, we don’t have to apply the logsumexp trick, since we don’t have to compute the denominator in that case. For the numerator one can simply take the log to prevent underflows: log(p(xY=C)p(Y=C)). More specifically:
ˆy=argmaxk∈{1,…,C}p(Ckx1,…,xn)=argmaxk∈{1,…,C} p(Ck)n∏i=1p(xiCk)
which becomes after taking the log:
ˆy=argmaxk∈{1,…,C}log(p(Ckx1,…,xn))=argmaxk∈{1,…,C}log( p(Ck)n∏i=1p(xiCk))=argmaxk∈{1,…,C}(log(p(Ck))+ n∑i=1log(p(xiCk)))

If we want to compute the class probability p(Y=Cx), we will need to compute the denominator:
log(p(Y=Cx))=log(p(xY=C)p(Y=C) ∑Ck=1p(xY=Ck)p(Y=Ck))=log(p(xY=C)p(Y=C)⏟numerator)−log( C∑k=1p(xY=Ck)p(Y=Ck)⏟denominator)
The element log( ∑Ck=1p(xY=Ck)p(Y=Ck)) may underflow because p(xiCk) can be very small: it is the same issue as in the numerator, but this time we have a summation inside the logarithm, which prevents us from transforming the p(xiCk) (can be close to 0) into log(p(xiCk)) (negative and not close to 0 anymore, since 0≤p(xiCk)≤1). To circumvent this issue, we can use the fact that p(xiCk)=exp(log(p(xiCk))) to obtain:
log( C∑k=1p(xY=Ck)p(Y=Ck))=log( C∑k=1exp(log(p(xY=Ck)p(Y=Ck))))
At that point, a new issue arises: log(p(xY=Ck)p(Y=Ck)) may be quite negative, which implies that exp(log(p(xY=Ck)p(Y=Ck))) may become very close to 0, i.e. underflow. This is where we use the logsumexp trick:
log∑keak=log∑keakeA−A=A+log∑keak−A
with:
 ak=log(p(xY=Ck)p(Y=Ck)),
 A=max
We can see that introducing the variable A avoids underflows. E.g. with k=2, a_1 = – 245, a_2 = – 255, we have:
 \exp \left(a_1\right) = \exp \left( 245\right) =3.96143\times 10^{ 107}
 \exp \left(a_2\right) = \exp \left( 255\right) =1.798486 \times 10^{111}
Using the logsumexp trick we avoid the underflow, with A=\max ( 245, 255 )=245:
\begin{align}\log \sum_k e^{a_k} &= \log \sum_k e^{a_k}e^{AA} \\&= A+ \log\sum_k e^{a_k A}\\ &= 245+ \log\sum_k e^{a_k +245}\\&= 245+ \log \left(e^{245 +245}+e^{255 +245}\right) \\&=245+ \log \left(e^{0}+e^{10}\right) \end{align}We avoided the underflow since e^{10} is much farther away from 0 than 3.96143\times 10^{ 107} or 1.798486 \times 10^{111}.
Attribution
Source : Link , Question Author : Josh , Answer Author : Community