Is random forest for regression a ‘true’ regression?

Random forests are used for regression. However, from what I understand, they assign an average target value at each leaf. Since there are only limited leaves in each tree, there are only specific values that the target can attain from our regression model. Thus is it not just a ‘discrete’ regression (like a step function) and not like linear regression which is ‘continuous’?

Am I understanding this correctly? If yes, what advantage does random forest offer in regression?

Answer

This is correct – random forests discretize continuous variables since they are based on decision trees, which function through recursive binary partitioning. But with sufficient data and sufficient splits, a step function with many small steps can approximate a smooth function. So this need not be a problem. If you really want to capture a smooth response by a single predictor, you calculate the partial effect of any particular variable and fit a smooth function to it (this does not affect the model itself, which will retain this stepwise character).

Random forests offer quite a few advantages over standard regression techniques for some applications. To mention just three:

  1. They allow the use of arbitrarily many predictors (more predictors than data points is possible)
  2. They can approximate complex nonlinear shapes without a priori specification
  3. They can capture complex interactions between predictions without a priori specification.

As for whether it is a ‘true’ regression, this is somewhat semantic. After all, piecewise regression is regression too, but is also not smooth. As is any regression with a categorical predictor, as pointed out in the comments below.

Attribution
Source : Link , Question Author : user110565 , Answer Author : mkt

Leave a Comment