I have a (k-class) classification problem, with of the order of 100 real-valued predictors, one of which appears to have much more explanatory power than any of the others. I’d like to get deeper into the effects of the other variables. However, standard machine learning techniques (random forests, SVMs, etc) seem to get swamped by the one strong predictor and don’t give me much interesting information about the others.
If this were a regression problem, I would simply regress against the strong predictor and then use the residuals as inputs for other algorithms. I don’t really see how this approach can be translated to a classification context though.
My instinct is that this problem must be reasonably common: is there a standard technique for dealing with it?
For 2-class problems, you can use the GBM package in R, which will iteratively fit classification trees to the residuals from the loss function. Unfortunately it does not yet support multi-class problems.
This seems like a problem that’s well suited for boosting, but I don’t know of any boosting packages that support k-class problems. I think the problem is writing an appropriate loss function for the multiple classes. The
glmnet packages has a multinomial loss function, perhaps you can look through it’s source code for some pointers.
You could try writing your own boosting algorithm, or you could turn your problem into k binary classification problems (one class vs. all other classes), fit a gbm model to each problem, and average the class probabilities from each model.