Caret package in R – get top Variable of Importance [closed]

I am facing two problems while using caret package in R. I am reproducing an example below:

library(mlbench)
library(caret)
set.seed(998)

data(Sonar)   #Random data, just for illustration purpose
Sonar= Sonar[, 1:6] #Selected first 6 columsn only for showing an example. I am assuming V6 to be response.
head(Sonar)
inTraining <- createDataPartition(Sonar$V6, p = 0.75, list = FALSE)
training <- Sonar[inTraining, ]
testing <- Sonar[-inTraining, ]
modelFit <- train( V6~.,data=training, method="rpart" )  
varImp(modelFit)

a. How to extract top three (3) variables from varImp output? I tried to order the variables but for any reason, its not working for me.

b. Also, why the following code doesn’t work for “randomForest”?

modelFit <- train( V6~.,data=training, method="rf" )  
varImp(modelFit)


> varImp(modelFit)

 Rerun with Debug
 Error in varImp[, "%IncMSE"] : subscript out of bounds 

Answer

What is the issue with #1? It runs fine for me and the result of the call to varImp() produces the following, ordered most to least important:

> varImp(modelFit)
rpart variable importance

   Overall
V5 100.000
V4  38.390
V3  38.362
V2   5.581
V1   0.000

EDIT Based on Question clarification:

I am sure there are better ways, but here is how I might do it:

ImpMeasure<-data.frame(varImp(modelFit)$importance)
    ImpMeasure$Vars<-row.names(ImpMeasure)
ImpMeasure[order(-ImpMeasure$Overall),][1:3,]

Regarding #2, you need to add importance=TRUE in order to tell randomForest to calculate them.

> modelFit <- train( V6~.,data=training, method="rf" ,importance = TRUE)
> varImp(modelFit)
rf variable importance

   Overall
V5 100.000
V3  22.746
V2  21.136
V4   3.797
V1   0.000

Attribution
Source : Link , Question Author : learner , Answer Author : B_Miner

Leave a Comment