Is ER more efficient implementation (somelike
Extreme Gradient Boostingis to gradient boosting)– is the difference important from practical point of view ? There is R package which implements them. Is it new algorithm which overcomes “generic” implementation (RandomForest package from R) not only in terms of efficiency or also in some other areas?
Extreme Random Forest http://link.springer.com/article/10.1007%2Fs10994-006-6226-1
This is pretty simple — RF optimizes splits on trees (i.e. select those which give best information gain with respect to decision) and ERF makes them at random. Now,
- optimisation costs (not much, but still), so ERF is usually faster.
- optimisation may contribute to correlation of trees in ensemble or overall overfitting, so ERFs are probably more robust, especially if the signal is weak.
Going even further in this direction, you can gain extra speed by equalising splits on each tree level, this way converting trees into ferns, which are also pretty interesting; there is my R implementation of such an individuum.