I am working on a multiclass problem with 9 possible labels, for which I have a dataset consisting of ~50.000 examples and ~200 features each. Each example can only belong to one class. The data is fairly balanced amongst the different labels.
Given its robustness and scalability, I decided to use Random Forest (with an ensemble of 1000 trees) as the learning method. In order to assess the performance accuracy of the model given this dataset, I used a stratified5Fold cross-validation (I am using scikit-learn 0.18).
Since Random Forest can inherently deal with multiclass datasets, I used it directly on the given dataset and obtained an accuracy of 79.5 ± 0.3. I was also interested in knowing which features had more importance, something that can be easily extracted from the attribute feature_importances_ in RandomForestClassifier of scikit. However, given that the dataset is well balanced and that, as expected, there are almost equally number of features out the 200 to contribute to the different classes, I was not able to isolate which features contribute the most to each class.
As a consequence, I adopted a one-versus-all strategy using the same Random Forest setup (cost-sensitive by the way, taking into account the imbalance of the data when using oneVsAll strategy), that allowed me to see for each class versus the rest which features are more important. The results that I obtained about this are reasonable. What’s more, when looking at the performance of the model using this strategy, I got a 88.7 ± 0.2 of accuracy, which came to me as a surprise as I was expecting multiclass Random Forest to classify better given its multiclass nature.
Am I right on this? Could such a difference in accuracy be plausible? Furthermore, Is the above strategy adopted OK and fair given that Random Forest by itself can tackle multiclass problems without any “hacking” such as the oneVsAll strategy?
I had exactly the same question as you, and was a bit sad to find out no answers were posted on your topic…
That said, I found this paper : One-Vs-All Binarization Technique in the
Context of Random Forest (https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2015-5.pdf) published in 2015.
The authors are showing better classification performances with one-versus-rest Random Forest classifiers compared to standard multiclass Random Forest ones.
The authors are not giving many clues on why it works so well, except that the trees generated in the one-versus-rest context are simpler.
I am wondering if you found some answers yourself since you posted your question?