How do I handle predictor variables from different distributions in logistic regression?

I am using logistic regression to predict y given x1 and x2:

z = B0 + B1 * x1 + B2 * x2
y = e^z / (e^z + 1)

How is logistic regression supposed to handle cases in which my variables have very different scales? Do people ever build logistic regression models with higher-order coefficients for variables? I’m imagining something like this (for two variables):

z = B0 + B1 * x1 + B2 * x1^2 + B3 * x2 + B4 * x2^2

Alternatively, is the right answer to simply normalize, standardize or rescale the x1 and x2 values before using logistic regression?

Answer

Of course you can normalize your parameters, this would also increase the speed of the learning algorithm.

In order to have comparable β at the end of the execution of the algorithm you should, for each feature xi, compute its mean μi and its range ri=max. Then you change each r[x_i] value, ie the value of feature x_i for a record r, with:
\frac{r[x_i] – \mu_i}{r_i}
Now your r[x_i] values lie in the interval [-1,1], so you can compare your \beta with more confidence and thus your odds ratio. This also shorten the time to find the best set of \beta if you are using gradient descent. Just remember to normalize your features if you want to predict the class of a new record r’.

You can also add higher order features but this lead to overfitting. Usually, as long as you add more parameters is better to add regularization, that try to avoid overfitting by decreasing the magnitude of your \beta. This is obtained adding this term to the logistic regression cost function
\lambda\sum_{i=0}^{n}\beta_i^2
where \lambda tune the power of the regularization.

I would suggest to have a look to Stanford’s classes about machine learning here: http://www.ml-class.org/course/video/preview_list, Unit 6 and 7.

Attribution
Source : Link , Question Author : James Thompson , Answer Author : Simone

Leave a Comment