# Different definitions of the cross entropy loss function

I started off learning about neural networks with the neuralnetworksanddeeplearning dot com tutorial. In particular in the 3rd chapter there is a section about the cross entropy function, and defines the cross entropy loss as:

$C = -\frac{1}{n} \sum\limits_x \sum\limits_j (y_j \ln a^L_j + (1-y_j) \ln (1 - a^L_j))$

However, reading the Tensorflow introduction, the cross entropy loss is defined as:

$C = -\frac{1}{n} \sum\limits_x \sum\limits_j (y_j \ln a^L_j)$ (when using the same symbols as above)

Then searching around to find what was going on I found another set of notes: (https://cs231n.github.io/linear-classify/#softmax-classifier) that uses a completely different definition of the cross entropy loss, albeit this time for an softmax classifier rather than for a neural network.

Can someone explain to me what is going on here? Why are there discrepancies btw. what people define the cross-entropy loss as? Is there just some overarching principle?

These three definitions are essentially the same.

1) The Tensorflow introduction,

2) For binary classifications $j=2$, it becomes

and because of the constraints $\sum_ja_j=1$ and $\sum_jy_j=1$, it can be rewritten as

which is the same as in the 3rd chapter.

3) Moreover, if $y$ is a one-hot vector (which is commonly the case for classification labels) with $y_k$ being the only non-zero element, then the cross entropy loss of the corresponding sample is

In the cs231 notes, the cross entropy loss of one sample is given together with softmax normalization as