Different definitions of the cross entropy loss function

I started off learning about neural networks with the neuralnetworksanddeeplearning dot com tutorial. In particular in the 3rd chapter there is a section about the cross entropy function, and defines the cross entropy loss as:


However, reading the Tensorflow introduction, the cross entropy loss is defined as:

C=1nxj(yjlnaLj) (when using the same symbols as above)

Then searching around to find what was going on I found another set of notes: (https://cs231n.github.io/linear-classify/#softmax-classifier) that uses a completely different definition of the cross entropy loss, albeit this time for an softmax classifier rather than for a neural network.

Can someone explain to me what is going on here? Why are there discrepancies btw. what people define the cross-entropy loss as? Is there just some overarching principle?


These three definitions are essentially the same.

1) The Tensorflow introduction,

2) For binary classifications j=2, it becomes
and because of the constraints jaj=1 and jyj=1, it can be rewritten as
which is the same as in the 3rd chapter.

3) Moreover, if y is a one-hot vector (which is commonly the case for classification labels) with yk being the only non-zero element, then the cross entropy loss of the corresponding sample is

In the cs231 notes, the cross entropy loss of one sample is given together with softmax normalization as

Source : Link , Question Author : Reginald , Answer Author : Sycorax

Leave a Comment