Boosting algorithms, such as AdaBoost, combine multiple ‘weak’ classifiers to form a single stronger classifier. Although in theory boosting should be possible with any base classifier, in practice it seems that tree-based classifiers are the most common.
Why is this? What properties of tree classifiers make them best suited for this task? Are there any other base classifiers which also benefit a lot from boosting? I ask with classification problems in mind, but I would also be interested in answers concerning regression applications.
I’m pretty sure you’re correct and that there’s no reason that necessarily requires using decision trees instead of other classifiers. That said, I think there are a few reasons they’re often used. Speed is one factor: Boosting may require training a lot of classifiers. If each one is a giant, multi-layer neural network, the whole procedure is going to be very slow.
More importantly, I think decision trees are “good enough.” Since the whole idea behind boosting is pooling weak classifiers, there’s not a huge incentive to drop in a heavy-weight solutions that might require more tuning (e.g., fiddling with the hyperparameters and kernel for SVMs).
Finally, boosting and decision trees are, at least in my head, somewhat conceptually similar (e.g., add a node/build a new classifier). A lot of the ensemble learning stuff seems to use trees. I think you could have a “random forest” of Naive Bayes learners if you really wanted to.