(Non-)linear regression at leafs decision tree

Is it common to have a different regression technique at the leaves of a regression tree (for instance linear regression)? I’ve been searching for it for the past hour but all I find are implementations that have a constant value at the trees’ leafs. Is there a reason why this is/is not common?

Answer

There has been quite some research on this topic over the last decades, starting with the pioneering efforts of Ciampi, followed by Loh’s GUIDE, and then also Gama’s functional trees or the model-based recursive partitioning approach by us. A nice overview is given in @Momo’s answer to this question: Advantage of GLMs in terminal nodes of a regression tree?

Corresponding software is less widely used than simple constant-fit trees as you observe. Part of the reason for this is presumably that it is more difficult to write – but also more difficult to use. It just requires more specifications than a simple CART model. But software is available (as previously pointed out here by @marqram or @Momo at: Regression tree algorithm with linear regression models in each leaf). Prominent software packages include:

  • In the Weka suite there are M5P (M5′) for continuous responses, LMT (logistic model trees) for binary responses, and FT (functional trees) for categorical responses. See http://www.cs.waikato.ac.nz/~ml/weka/ for more details. The former two functions are also easily interfaced through the R package RWeka.

  • Loh’s GUIDE implementation is available in binary form at no cost (but without source code) from http://www.stat.wisc.edu/~loh/guide.html. It allows to modify the details of the method by a wide range of control options.

  • Our MOB (MOdel-Based recursive partitioning) algorithm is available in the R package partykit (successor to the party implementation). The mob() function gives you a general framework, allowing you to specify new models that can be easily fitted in the nodes/leaves of the tree. Convenience interfaces lmtree() and glmtree() that combine mob() with lm() and glm() are directly available and illustrated in vignette("mob", package = "partykit"). But other plugins can also be defined. For example, in https://stackoverflow.com/questions/37037445/using-mob-trees-partykit-package-with-nls-model mob() is combined with nls(). But there are also “mobsters” for various psychometric models (in psychotree) and for beta regression (in betareg).

Attribution
Source : Link , Question Author : marqram , Answer Author : Community

Leave a Comment