Are categorical variables standardized differently in penalized regression? [duplicate]

In penalized/regularized regression (lasso, ridge, etc.) the predictors are typically standardized to be centered at 0 and often to have variance 1. Are categorical predictors treated differently. If so, why? What are the consequences of using the same standardization? Is a reference available?


I think the main point is what you want to do with the model.
There is not a single answer to whether you should standardize none, some or all of variables. It depends on what you want your model for.

Using the z-score of the predictors (what you call standardizing), puts all the predictors in the same scale, but makes interpretation a little bit more difficult. The interpretation of the coefficients is now “how much a change in the standard deviation affects the output variable”.

Many times, penalized/regularized regressions are not suitable for interpretation, because you are introducing a bias in the coefficients. Usually when you use such models, you are interested in the predictions, not in doing a counterfactual analysis. Standardizations are useful because they make the problem numerically more stable. If such is your case, it doesn’t make a big difference if you “standardize” your categorical predictors or not.

Try asking a more specific answer, including what kind of analysis you want to do with your problem, and you can get a more specific answer 🙂

Source : Link , Question Author : julieth , Answer Author : Jose G

Leave a Comment