I have been going through the sklearn documentation but I am not able to understand the purpose of these functions in the context of logistic regression.

For`decision_function`

it says that its the distance between the hyperplane and the test instance. how is this particular information useful? and how does this relate to`predict`

and`predict-proba`

methods?

**Answer**

Recall that the functional form of logistic regression is

$$ f(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k)}} $$

This is what is returned by `predict_proba`

.

The term inside the exponential

$$ d(x) = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k $$

is what is returned by `decision_function`

. The “hyperplane” referred to in the documentation is

$$ \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k = 0 $$

This terminology is a holdover from support vector machines, which literally estimate a separating hyperplane. For logistic regression this hyperplane is a bit of an artificial construct, it is the plane of equal probability, where the model has determined both target classes are equally likely.

The `predict`

function returns a class decision using the rule

$$ f(x) > 0.5 $$

At the risk of soapboxing, the `predict`

function has very few legitimate uses, and I view using it as a sign of error when reviewing others work. I would go far enough to call it a design error in sklearn itself (the `predict_proba`

function should have been called `predict`

, and `predict`

should have been called `predict_class`

, if anything at all).

**Attribution***Source : Link , Question Author : Sameed , Answer Author : Matthew Drury*