What is the formula to calculate the area under the ROC curve from a contingency table?

For example, if my table is:

                True Value (gold standard)
                   Positive  | Negative |
         |        |          |          |
Test     | Pos    |    A     |     B    |   
Result   |        |          |          |
         | Neg    |    C     |     D    |
         |        |          |          |

Answer

In the general case: you can’t

The ROC curve shows how sensitivity and specificity varies at every possible threshold. A contingency table has been calculated at a single threshold and information about other thresholds has been lost. Therefore you can’t calculate the ROC curve from this summarized data.

But my classifier is binary, so I have one single threshold

Binary classifiers aren’t really binary. Even though they may expose only a final binary decision, all the classifiers I know rely on some quantitative estimate under the hood.

  • A binary decision tree? Try to build a regression tree.
  • A classifier SVM? Do a support vector regression.
  • Logistic regression? Get access to the raw probabilities.
  • Neural network? Use the numeric output of the last layer instead.

This will give you more freedom to choose the optimal threshold to get to the best possible classification for your needs.

But I really want to

You really shouldn’t. ROC curves with few thresholds significantly underestimate the true area under the curve (1). A ROC curve with a single point is a worst-case scenario, and any comparison with a continuous classifier will be inaccurate and misleading.

Just give me the answer!

Ok, ok, you win. With a single point we can consider the AUC as the sum of two triangles T and U:

A ROC curve with a single (SP, SE) pair and two triangles

We can get their areas based on the contingency table (A, B, C and D as you defined):

$$
\begin{align*}
T = \frac{1 \times SE}{2} &= \frac{SE}{2} = \frac{A}{2(A + C)} \\
U = \frac{SP \times 1}{2} &= \frac{SP}{2} = \frac{D}{2(B + D)}
\end{align*}
$$

Getting the AUC:
$$
\begin{align*}
AUC &= T + U \\
&= \frac{A}{2(A + C)} + \frac{D}{2(B + D)} \\
&= \frac{SE + SP}{2}
\end{align*}
$$

To conclude

You can technically calculate a ROC AUC for a binary classifier from the confusion matrix. But just in case I wasn’t clear, let me repeat one last time: DON’T DO IT!

References

(1) DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the Areas under
Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988,44:837-845.
https://www.jstor.org/stable/2531595

Attribution
Source : Link , Question Author : Jeremy Miles , Answer Author : AabyWan

Leave a Comment