# How do I handle predictor variables from different distributions in logistic regression?

I am using logistic regression to predict y given x1 and x2:

z = B0 + B1 * x1 + B2 * x2
y = e^z / (e^z + 1)


How is logistic regression supposed to handle cases in which my variables have very different scales? Do people ever build logistic regression models with higher-order coefficients for variables? I’m imagining something like this (for two variables):

z = B0 + B1 * x1 + B2 * x1^2 + B3 * x2 + B4 * x2^2


Alternatively, is the right answer to simply normalize, standardize or rescale the x1 and x2 values before using logistic regression?

Of course you can normalize your parameters, this would also increase the speed of the learning algorithm.

In order to have comparable $\beta$ at the end of the execution of the algorithm you should, for each feature $x_i$, compute its mean $\mu_i$ and its range $r_i = \max_i - \min_i$. Then you change each $r[x_i]$ value, ie the value of feature $x_i$ for a record $r$, with:

Now your $r[x_i]$ values lie in the interval [-1,1], so you can compare your $\beta$ with more confidence and thus your odds ratio. This also shorten the time to find the best set of $\beta$ if you are using gradient descent. Just remember to normalize your features if you want to predict the class of a new record $r'$.

You can also add higher order features but this lead to overfitting. Usually, as long as you add more parameters is better to add regularization, that try to avoid overfitting by decreasing the magnitude of your $\beta$. This is obtained adding this term to the logistic regression cost function

where $\lambda$ tune the power of the regularization.

I would suggest to have a look to Stanford’s classes about machine learning here: http://www.ml-class.org/course/video/preview_list, Unit 6 and 7.