Is it common to have a different regression technique at the leaves of a regression tree (for instance linear regression)? I’ve been searching for it for the past hour but all I find are implementations that have a constant value at the trees’ leafs. Is there a reason why this is/is not common?
There has been quite some research on this topic over the last decades, starting with the pioneering efforts of Ciampi, followed by Loh’s GUIDE, and then also Gama’s functional trees or the model-based recursive partitioning approach by us. A nice overview is given in @Momo’s answer to this question: Advantage of GLMs in terminal nodes of a regression tree?
Corresponding software is less widely used than simple constant-fit trees as you observe. Part of the reason for this is presumably that it is more difficult to write – but also more difficult to use. It just requires more specifications than a simple CART model. But software is available (as previously pointed out here by @marqram or @Momo at: Regression tree algorithm with linear regression models in each leaf). Prominent software packages include:
In the Weka suite there are
M5P(M5′) for continuous responses,
LMT(logistic model trees) for binary responses, and
FT(functional trees) for categorical responses. See http://www.cs.waikato.ac.nz/~ml/weka/ for more details. The former two functions are also easily interfaced through the R package
Loh’s GUIDE implementation is available in binary form at no cost (but without source code) from http://www.stat.wisc.edu/~loh/guide.html. It allows to modify the details of the method by a wide range of control options.
Our MOB (MOdel-Based recursive partitioning) algorithm is available in the R package
partykit(successor to the
mob()function gives you a general framework, allowing you to specify new models that can be easily fitted in the nodes/leaves of the tree. Convenience interfaces
glm()are directly available and illustrated in
vignette("mob", package = "partykit"). But other plugins can also be defined. For example, in https://stackoverflow.com/questions/37037445/using-mob-trees-partykit-package-with-nls-model
mob()is combined with
nls(). But there are also “mobsters” for various psychometric models (in
psychotree) and for beta regression (in