PREFACE: I don’t care about the merits of using a cutoff or not, or how one should choose a cutoff. My question is purely mathematical and due to curiosity.

Logistic regression models the posterior conditional probability of class A versus class B and it fits a hyperplane where the posterior conditional probabilities are equal. So in theory, I understood that a 0.5 classification point will minimize total errors regardless of set balance, since it models the posterior probability (assuming you consistently encounter the same class ratio).

In my real life example, I obtain very poor accuracy using P > 0.5 as my classifying cutoff (about 51% accuracy). However, when I looked at the AUC it is above 0.99. So I looked at some different cutoff values and found that P > 0.6 gave me 98% accuracy (90% for the smaller class and 99% for the bigger class) – only 2% of cases misclassified.

The classes are heavily unbalanced (1:9) and it is a high-dimensional problem. However, I allocated the classes equally to each cross-validation set so that there should not be a difference between the balance of classes between model fit and then prediction. I also tried using the same data from the model fit and in predictions and the same issue occurred.

I’m interested in the reason why 0.5 would not minimize errors, I thought this would be by design if the model is being fit by minimizing cross-entropy loss.

Does anyone have any feedback as to why this happens? Is it due to adding penalization, can someone explain what is happening if so?

**Answer**

You don’t have to get predicted categories from a logistic regression model. It can be fine stay with predicted probabilities. If you do get predicted categories, you should *not* use that information to do anything other than say ‘this observation is best classified into this category’. For example, you should not use ‘accuracy’ / percent correct to select a model.

Having said those things, .50 is rarely going to be the optimal cutoff for classifying observations. To get an intuitive sense of how this could happen, imagine that you had N=100 with 99 observations in the positive category. A simple, intercept-only model could easily have 49 false negatives when you use .50 as your cutoff. On the other hand, if you just called everything positive, you would have 1 false positive, but 99% correct.

More generally, logistic regression is trying to fit the true probability positive for observations as a function of explanatory variables. It is not trying to maximize accuracy by centering predicted probabilities around the .50 cutoff. If your sample isn’t 50% positive, there is just no reason .50 would maximize the percent correct.

**Attribution***Source : Link , Question Author : felix000 , Answer Author : gung – Reinstate Monica*