Consider the three scoring rules in the case of a binary prediction:

- Log:
`sum(log(ifelse(outcome, probability, 1-probability))) / n`

- Brier:
`sum((outcome-probability)**2) / n`

- Sphere:
`sum(ifelse(outcome, probability, 1-probability)/sqrt(probability**2+(1-probability)**2)) / n`

What is the

intuitionbehind them? When should I use one and not the other?

I am especially interested in the case of low prevalence (e.g., 0.1%).PS. This is to evaluate the results from my calibration algorithm which I asked about before.

**Answer**

One place where log scoring may be inappropriate: the comparison of human forecasters (who may tend to overstate their confidence).

Log scoring strongly penalizes very overconfident wrong predictions. A wrong prediction that was made with 100% confidence gets an infinite penalty. For example, suppose a commentator says “I am 100% sure that Smith will win the election,” and then Smith loses the election. Under log scoring, the average score of all the commentator’s predictions is now permanently stuck at −∞, the worst possible. It should be possible to distinguish that somebody who has made a single wrong 100% confidence prediction is a better forecaster than somebody who makes them all the time.

**Attribution***Source : Link , Question Author : sds , Answer Author : fblundun*