What is the difference between decision_function, predict_proba, and predict function for logistic regression problem?

I have been going through the sklearn documentation but I am not able to understand the purpose of these functions in the context of logistic regression.
For decision_function it says that its the distance between the hyperplane and the test instance. how is this particular information useful? and how does this relate to predict and predict-proba methods?


Recall that the functional form of logistic regression is

$$ f(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k)}} $$

This is what is returned by predict_proba.

The term inside the exponential

$$ d(x) = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k $$

is what is returned by decision_function. The “hyperplane” referred to in the documentation is

$$ \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k = 0 $$

This terminology is a holdover from support vector machines, which literally estimate a separating hyperplane. For logistic regression this hyperplane is a bit of an artificial construct, it is the plane of equal probability, where the model has determined both target classes are equally likely.

The predict function returns a class decision using the rule

$$ f(x) > 0.5 $$

At the risk of soapboxing, the predict function has very few legitimate uses, and I view using it as a sign of error when reviewing others work. I would go far enough to call it a design error in sklearn itself (the predict_proba function should have been called predict, and predict should have been called predict_class, if anything at all).

Source : Link , Question Author : Sameed , Answer Author : Matthew Drury

Leave a Comment