# Do coefficients of logistic regression have a meaning?

I have a binary classification problem from several features. Do the coefficients of a (regularized) logistic regression have an interpretable meaning?

I thought they could indicate the size of influence, given the features are normalized beforehand. However, in my problem the coefficients seem to depend sensitively on the features I select. Even the sign of the coefficients changes with different feature sets chosen as input.

Does it make sense to examine the value of the coefficients and what is the correct way to find the most meaningful coefficients and state their meaning in words? Are some fitted models and their sign of the coefficients wrong – even if when they sort-of fit the data?

(The highest correlation that I have between features is only 0.25, but that certainly plays a role?)

The coefficients from the output do have a meaning, although it isn’t very intuitive to most people and certainly not to me. That is why people change them to odds ratios. However, the log of the odds ratio is the coefficient; equivalently, the exponentiated coefficients are the odds ratios.

The coefficients are most useful for plugging into formulas that give predicted probabilities of being in each level of the dependent variable.

e.g. in R

library("MASS")
data(menarche)
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
family=binomial(logit), data=menarche)

summary(glm.out)


The parameter estimate for age is 1.64. What does this mean? Well, if you combine it with the parameter estimate for the intercept (-21.24) you can get a formula predicting the likelihood of menarche:

$P(M) = \frac{1}{1 + e^{21.24 – 1.64*age}}$

but that formula (even with just one variable!) doesn’t give much of a sense of how age is related to menarche. If we use the odds ratio (which is $e^{1.64} = 5.16$ that means that, for each additional year of age, the odds of menarche are 5.16 times as big (not exactly 5.16 times as likely, but that interpretation is often used).