Say we have a binary classification problem with mostly categorical features. We use some non-linear model (e.g. XGBoost or Random Forests) to learn it.
- Should one still be concerned about multi-collinearity? Why?
- If the answer to the above is true, how should one fight it considering that one is using these types of non-linear models?
Answer
Multi-collinearity will not be a problem for certain models. Such as random forest or decision tree. For example, if we have two identical columns, decision tree / random forest will automatically “drop” one column at each split. And the model will still work well.
In addition, regularization is a way to “fix” Multi-collinearity problem. My answer Regularization methods for logistic regression gives details.
Attribution
Source : Link , Question Author : Josh , Answer Author : Community