How should decision tree splits be implemented when predicting continuous variables?

I’m actually writing an implementation of Random Forests but I believe the question is specific to decision trees (independent of RFs).

So the context is that I’m creating a node in a decision tree and both the prediction and target variables are continuous. The node has a split threshold to partition data into two sets, and I create a new prediction for each subset based on the average target value in each set. Is this the correct approach?

The reason I ask is that when predicting binary variables I believe the typical (correct?) approach is to divide the data into 0 and 1 subsets without taking an average over the data rows in each subset. Subsequent splits will divide into finer grained subsets and taking an average at each split results subsequent splits (lower down the decision tree) operating on what are now continuous variables rather than binary variables (because we are operating on the residual error values instead of the original targets).

Side question: Is the distinction between the two approaches (binary vs continuous) significant – or will they actually give identical results for a complete decision tree?

Answer

One potential issue with trees is that they tend to fit poorly in the tails. Think of a terminal node that captures the low range of the training set. It will predict using the mean of those training set points, which will always under-predict the outcome (since it is the mean).

You might try model trees [1]. These will fit linear models in the terminal nodes and (I think) do a better job than regression trees. Better yet, use a more evolved version called Cubist that combines different approaches ([1] and [2] below).

These models also handle continuous and discrete predictors differently. They can do multi-way splits for categorical variables. The splitting criterion is very similar to CART trees.

Model trees can be found in R in the RWeka package (called ‘M5P’) and Cubist is in the Cubist package. Of course, you can use Weka too and Cubist has a C version available at the RuleQuest website.

[1] Quinlan, J. (1992). Learning with continuous classes. Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, 343–348.

[2] Quinlan, J. (1993). Combining instance-based and model-based learning. Proceedings of the Tenth International Conference on Machine Learning, 236–243.

Attribution
Source : Link , Question Author : redcalx , Answer Author : topepo

Leave a Comment