Different definitions of the cross entropy loss function

I started off learning about neural networks with the neuralnetworksanddeeplearning dot com tutorial. In particular in the 3rd chapter there is a section about the cross entropy function, and defines the cross entropy loss as:

C=1nxj(yjlnaLj+(1yj)ln(1aLj))

However, reading the Tensorflow introduction, the cross entropy loss is defined as:

C=1nxj(yjlnaLj) (when using the same symbols as above)

Then searching around to find what was going on I found another set of notes: (https://cs231n.github.io/linear-classify/#softmax-classifier) that uses a completely different definition of the cross entropy loss, albeit this time for an softmax classifier rather than for a neural network.

Can someone explain to me what is going on here? Why are there discrepancies btw. what people define the cross-entropy loss as? Is there just some overarching principle?

Answer

These three definitions are essentially the same.

1) The Tensorflow introduction,
C=1nxj(yjlnaj).

2) For binary classifications j=2, it becomes
C=1nx(y1lna1+y2lna2)
and because of the constraints jaj=1 and jyj=1, it can be rewritten as
C=1nx(y1lna1+(1y1)ln(1a1))
which is the same as in the 3rd chapter.

3) Moreover, if y is a one-hot vector (which is commonly the case for classification labels) with yk being the only non-zero element, then the cross entropy loss of the corresponding sample is
Cx=j(yjlnaj)=(0+0+...+yklnak)=lnak.

In the cs231 notes, the cross entropy loss of one sample is given together with softmax normalization as
Cx=ln(ak)=ln(efkjefj).

Attribution
Source : Link , Question Author : Reginald , Answer Author : Sycorax

Leave a Comment