Can Random Forest Methodology be Applied to Linear Regressions?

Random Forests work by creating an ensemble of decision trees where each tree is created using a bootstrap sample of the original training data (sample of both input variables and observations).

Can a similar process be applied for linear regression?
Create k linear regression models using a random bootstrap sample for each of the k regressions

What are the reasons NOT to create a “random regression” like model?

Thanks. If there’s something I’m just fundamentally misunderstanding then please let me know.

Answer

I partially disagree with the present answers because the methodology random forest is built upon introduces variance (CARTs built on bootstrapped samples + random subspace method) to make them independent. Once you have orthogonal trees then the average of their predictions tends (in many cases) to be better than the prediction of the average tree (because of Jensen’s inequality). Although CARTs have noticeable perks when subject to this treatment this methodology definitely applies to any model and linear models are no exception. Here is an R package which is exactly what you are looking for. It presents a nice tutorial on how to tune and interpret them and bibliography on the subject: Random Generalized Linear Models.

Attribution
Source : Link , Question Author : Rick , Answer Author : JEquihua

Leave a Comment