# The intuition behind the different scoring rules

Consider the three scoring rules in the case of a binary prediction:

1. Log: sum(log(ifelse(outcome, probability, 1-probability))) / n
2. Brier: sum((outcome-probability)**2) / n
3. Sphere: sum(ifelse(outcome, probability, 1-probability)/sqrt(probability**2+(1-probability)**2)) / n

What is the intuition behind them? When should I use one and not the other?
I am especially interested in the case of low prevalence (e.g., 0.1%).

PS. This is to evaluate the results from my calibration algorithm which I asked about before.

Log scoring strongly penalizes very overconfident wrong predictions. A wrong prediction that was made with 100% confidence gets an infinite penalty. For example, suppose a commentator says “I am 100% sure that Smith will win the election,” and then Smith loses the election. Under log scoring, the average score of all the commentator’s predictions is now permanently stuck at $$−∞-\infty$$, the worst possible. It should be possible to distinguish that somebody who has made a single wrong 100% confidence prediction is a better forecaster than somebody who makes them all the time.