I’ve been reading a bit on boosting algorithms for classification tasks and Adaboost in particular. I understand that the purpose of Adaboost is to take several “weak learners” and, through a set of iterations on training data, push classifiers to learn to predict classes that the model(s) repeatedly make mistakes on. However, I was wondering why so many of the readings I’ve done have used decision trees as the weak classifier. Is there a particular reason for this? Are there certain classifiers that make particularly good or bad candidates for Adaboost?
I talked about this in an answer to a related SO question. Decision trees are just generally a very good fit for boosting, much more so than other algorithms. The bullet point/ summary version is this:
- Decision trees are non-linear. Boosting with linear models simply doesn’t work well.
- The weak learner needs to be consistently better than random guessing. You don’t normal need to do any parameter tuning to a decision tree to get that behavior. Training an SVM really does need a parameter search. Since the data is re-weighted on each iteration, you likely need to do another parameter search on each iteration. So you are increasing the amount of work you have to do by a large margin.
- Decision trees are reasonably fast to train. Since we are going to be building 100s or 1000s of them, thats a good property. They are also fast to classify, which is again important when you need 100s or 1000s to run before you can output your decision.
- By changing the depth you have a simple and easy control over the bias/variance trade off, knowing that boosting can reduce bias but also significantly reduces variance. Boosting is known to overfit, so the easy nob to tune is helpful in that regard.