Should one be concerned about multi-collinearity when using non-linear models?

Say we have a binary classification problem with mostly categorical features. We use some non-linear model (e.g. XGBoost or Random Forests) to learn it.

  • Should one still be concerned about multi-collinearity? Why?
  • If the answer to the above is true, how should one fight it considering that one is using these types of non-linear models?


Multi-collinearity will not be a problem for certain models. Such as random forest or decision tree. For example, if we have two identical columns, decision tree / random forest will automatically “drop” one column at each split. And the model will still work well.

In addition, regularization is a way to “fix” Multi-collinearity problem. My answer Regularization methods for logistic regression gives details.

Source : Link , Question Author : Josh , Answer Author : Community

Leave a Comment