# Why is it wrong to interpret SVM as classification probabilities?

My understanding of SVM is that it’s very similar to a logistic regression (LR), i.e. a weighted sum of features is passed to the sigmoid function to get a probability of belonging to a class, but instead of the cross-entropy (logistic) loss function, training is performed using the hinge loss. The benefit of using the hinge loss is that one can do various numerical tricks to make kernelisation more efficient. A drawback, however, is that the resulting model has less information than a corresponding LR model could have. So, for example, without kernelisation (using a linear kernel) the SVM decision boundary would still be at the same location where LR would output a probability of 0.5, BUT one cannot tell how quickly the probability of belonging to a class decays away from the decision boundary.

My two questions are:

1. Is my interpretation above correct?
2. How does using the hinge loss make it invalid to interpret SVM results as probabilities?

A SVM does not feed anything into a sigmoid function. It fits a separating hyperplane to the data that tries to put all data points from your training set that are of one class on one side, and all points of the other class on the other. Consequently, it assigns class based on which side your feature vector is on. More formally, if we denote the feature vector as $\mathbf{x}$ and the hyperplane coefficients as $\mathbf{\beta}$ and $\beta_0$ the intercept, then the class assignment is $y = sign(\beta \cdot \mathbf{x} + \beta_0)$. Solving an SVM amounts to finding $\beta, \beta_0$ which minimize the hinge loss with the greatest possible margin. Therefore, because an SVM only cares about which side of the hyperplane you are on, you cannot transform its class assignments into probabilities.