Most of the advice on how to deal with multicolinear predictors tells you to eliminate them before fitting your model, using some criterium like VIF (Variance Inflation Factor). If I understand it right, this will eliminate predictors based on whichever minor differences they have on their variation.

What I don’t understand is how is this any better than what the fitting algorithm is doing when it picks one of the predictors over the others, also based on minor differences between them.

In the end, the actual effect of the predictors on the response will remain indistinguishable, since their variation is too similar to tell, regardless if you select then a priori or let the algorithm do it.

Why not skip the a priori predictor selection through VIF and go straight to model selection, which will tell you models with either of the colinear predictors will have similar AIC? Then you can attribute the effect to whatever the predictors have in common or simply state it is not possible to tell which one is causing the response.

**Answer**

If the goal is prediction then your proposed solution sounds agreeable, but inference is a whole other beast.

The fact is that the theory, for example concerning sampling distributions of coefficients, is not capable of conditioning in such model selection procedures. Frank Harrell talks a little bit about this in Regression Modelling Strategies in the section about stepwise regression.

Statistics is not an algorithmic truth generating process. You need to be able to say “I think X is related to the outcome” in order to make inferences of the nature statistics affords us.

**Attribution***Source : Link , Question Author : Gabriel De Oliveira Caetano , Answer Author : Demetri Pananos*