What is a good oob score for random forests with sklearn, three-class classification? [duplicate]

I have learning data consisting of ~45k samples, each has 21 features. I am trying to train a random forest classifier on this data, which is labelled to 3 classes (-1, 0 and 1). The classes are more or less equal in their sizes.

My random forest classifier model is using gini as its split quality criterion, the number of trees is 10, and I have not limited the depth of a tree.

Most of the features have shown negligible importance – the mean is about 5%, a third of them is of importance 0, a third of them is of importance above the mean.

However, perhaps the most striking fact is the oob (out-of-bag) score: a bit less than 1%. It made me think the model fails, and indeed, testing the model on a new independent set of size ~40k, I got score of 63% (sounds good so far), but a deeper inspection of the confusion matrix have shown me that the model only succeeds for class 0, and fails in about 50% of the cases when it comes to decide between 1 and -1.

Python’s output attached:

array([[ 7732,   185,  6259],
       [  390, 11506,   256],
       [ 7442,   161,  6378]])

This is naturally because the 0 class has special properties which makes it much easier to predict. However, is it true that the oob score I’ve found is already a sign that the model is not good? What is a good oob score for random forests? Is there some law-of-thumb which helps determining whether a model is “good”, using the oob score alone, or in combination with some other results of the model?


Edit: after removing bad data (about third of the data), the labels were more or less 2% for 0 and 49% for each of -1/+1. The oob score was 0.011 and the score on the test data was 0.49, with confusion matrix hardly biased towards class 1 (about 3/4 of the predictions).

Answer

sklearn’s RF oob_score_ (note the trailing underscore) seriously isn’t very intelligible compared to R’s, after reading the sklearn doc and source code.
My advice on how to improve your model is as follows:

  1. sklearn’s RF used to use the terrible default of max_features=1 (as in “try every feature on every node”). Then it’s no longer doing random column(/feature)-selection like a random-forest. Change this to e.g.max_features=0.33 (like R’s mtry) and rerun. Tell us the new scores.

  2. “Most of the features have shown negligible importance”. Then you need to do Feature Selection, as per the doc – for classification. See the doc and other articles here on CrossValidated.SE. Do the FS on a different (say 20-30%) holdout set than the rest of the training, using e.g. sklearn.cross_validation.train_test_split() (yes the name is a bit misleading). Now tell us the scores you get after FS?

  3. You said “after removing bad data (about third of the data), the labels were more or less 2% for 0 and 49% for each of -1/+1” ; then you have a severe class imbalance. Also: “confusion matrix shows model only succeeds for class 0, and fails in about 50% of the cases between +1 and -1”. This is a symptom of the class imbalance. Either you use stratified sampling, or train a classifier with examples for +1 and -1 class. You can either do a OAA (One-Against-All) or OAO (One-Against-One) classifier. Try three OAA classifiers, one for each class. Finally tell us those scores?

Attribution
Source : Link , Question Author : Bach , Answer Author : smci

Leave a Comment