I know that logistic regression finds a hyperplane that separates the training samples. I also know that Support vector machines finds the hyperplane with the maximum margin.
My question: is the difference then between logistic regression (LR) and support vector machines (SVM) is that LR finds any hyperplane that separates the training samples while SVM finds the hyperplane with the maximum margin? Or am I wrong?
Note: recall that in LR when θ⋅x=0 then the logistic function gives 0.5. If we assume 0.5 as a classification threshold, then θ⋅x=0 is a hyperplane or a decision boundary.
You are right if you are talking about hard SVM and the two classes are linearly separable. LR finds any solution that separates the two classes. Hard SVM finds “the” solution among all possible ones that has the maximum margin.
In case of soft SVM and the classes not being linearly separable, you are still right with a slight modification. The error cannot become zero. LR finds a hyperplane that corresponds to the minimization of some error. Soft SVM tries to minimize the error (another error) and at the same time trades off that error with the margin via a regularization parameter.
One difference between the two: SVM is a hard classifier but LR is a probabilistic one. SVM is sparse. It chooses the support vectors (from the training samples) that has the most discriminatory power between the two classes. Since it does not keep other training points beyond that at the test time, we do not have any idea about about the distribution of any of the two classes.
I have explained how LR solution (using IRLS) breaks in case of linearly separability of the two classes and why it stops being a probabilistic classifier in such a case: https://stats.stackexchange.com/a/133292/66491