I’ve been using the glm.fit function in R to fit parameters to a logistic regression model. By default, glm.fit uses iteratively reweighted least squares to fit the parameters. What are some reasons this algorithm would fail to converge, when used for logistic regression?

**Answer**

In case the two classes are separable, iteratively reweighted least squares (IRLS) would break. In such a scenario, any hyperplane that separates the two classes is a solution and there are infinitely many of them. IRLS is meant to find a maximum likelihood solution. Maximum likelihood does not have a mechanism to favor any of these solutions over the other (e.g. no concept of maximum margin). Depending on the initialization, IRLS should go toward one of these solutions and would break due to numerical problems (don’t know the details of IRLS; an educated guess).

Another problem arises in case of linear-separability of the training data. Any of the hyperplane solutions corresponds to a heaviside function. Therefore, all the probabilities are either 0 or 1. The linear regression solution would be a hard classifier rather than a probabilistic classifier.

To clarify using mathematical notation, the heaviside function is lim, the limit of sigmoid function, where \sigma is the sigmoid function and (\mathbf{w}, b) determines the hyperplane solution. So IRLS theoretically does not stop and goes toward a \mathbf{w} with increasing magnitude but would break in practice due to numerical problems.

**Attribution***Source : Link , Question Author : Jessica , Answer Author : Seeda*