Maximum number of independent variables in Logistic Regression

Is there a measure in logistic regression that maybe penalizes you for having too many independent variables like in multiple regression with the adjusted R squared?

That is, does having too many independent variables in a logistic regression hurt the model?

What about dummy variables? Can you have too many of those to the point of unpredictability?


For the typical low signal:noise ratio we see in most problems, a common rule of thumb is that you need about 15 times as many events and 15 times as many non-events as there are parameters that you entertain putting into the model. The rationale for that “rule” is that it results in a model performance metric that is likely to be as good or as bad in new data as it appears to be in the training data. But you need 96 observations just to estimate the intercept so that the overall predicted risk is within a $\pm 0.1$ margin of error of the true risk with 0.95 confidence.

Source : Link , Question Author : Micro , Answer Author : Frank Harrell

Leave a Comment