I started off learning about neural networks with the neuralnetworksanddeeplearning dot com tutorial. In particular in the 3rd chapter there is a section about the cross entropy function, and defines the cross entropy loss as:

C=−1n∑x∑j(yjlnaLj+(1−yj)ln(1−aLj))

However, reading the Tensorflow introduction, the cross entropy loss is defined as:

C=−1n∑x∑j(yjlnaLj) (when using the same symbols as above)

Then searching around to find what was going on I found another set of notes: (https://cs231n.github.io/linear-classify/#softmax-classifier) that uses a completely different definition of the cross entropy loss, albeit this time for an softmax classifier rather than for a neural network.

Can someone explain to me what is going on here? Why are there discrepancies btw. what people define the cross-entropy loss as? Is there just some overarching principle?

**Answer**

These three definitions are essentially the same.

1) The Tensorflow introduction,

C=−1n∑x∑j(yjlnaj).

2) For binary classifications j=2, it becomes

C=−1n∑x(y1lna1+y2lna2)

and because of the constraints ∑jaj=1 and ∑jyj=1, it can be rewritten as

C=−1n∑x(y1lna1+(1−y1)ln(1−a1))

which is the same as in the 3rd chapter.

3) Moreover, if y is a one-hot vector (which is commonly the case for classification labels) with yk being the only non-zero element, then the cross entropy loss of the corresponding sample is

Cx=−∑j(yjlnaj)=−(0+0+...+yklnak)=−lnak.

In the cs231 notes, the cross entropy loss of one sample is given together with softmax normalization as

Cx=−ln(ak)=−ln(efk∑jefj).

**Attribution***Source : Link , Question Author : Reginald , Answer Author : Sycorax*