Variable importance from GLMNET

I am looking at using the lasso as a method for selecting features and fitting a predictive model with a binary target. Below is some code I was playing with to try out the method with regularized logistic regression.

My question is I get a group of “significant” variables but am I able to rank order these to estimate relative importance of each? Can the coefficients be standardized for this purpose of rank by absolute value (I understand that they are shown on the original variable scale through the coef function)? If so, how to do so (using the standard deviation of x and y) Standardize Regression Coefficients.

SAMPLE CODE:

    library(glmnet)

    #data comes from

#http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

    datasetTest <- read.csv('C:/Documents and Settings/E997608/Desktop/wdbc.data.txt',head=FALSE)


#appears to use the first level as the target success
   datasetTest$V2<-as.factor(ifelse(as.character(datasetTest$V2)=="M","0","1"))


#cross validation to find optimal lambda
#using the lasso because alpha=1

    cv.result<-cv.glmnet(       
              x=as.matrix(dataset[,3:ncol(datasetTest)]),
              y=datasetTest[,2],        
              family="binomial",        
              nfolds=10,        
              type.measure="deviance",       
              alpha=1      
              )

#values of lambda used

    histogram(cv.result$lambda)

#plot of the error measure (here was deviance)
#as a CI from each of the 10 folds
#for each value of lambda (log actually)

    plot(cv.result) 

#the mean cross validation error (one for each of the
#100 values of lambda

    cv.result$cvm

#the value of lambda that minimzes the error measure
#result: 0.001909601

    cv.result$lambda.min
    log(cv.result$lambda.min)

#the value of lambda that minimzes the error measure
#within 1 SE of the minimum
#result: 0.007024236

    cv.result$lambda.1se

#the full sequence was fit in the object called cv.result$glmnet.fit
#this is same as a call to it directly.
#here are the coefficients from the min lambda

    coef(cv.result$glmnet.fit,s=cv.result$lambda.1se)

Answer

As far as I know glmnet does not calculate the standard errors of regression coefficients (since it fits model parameters using cyclic coordinate descent). So, if you need standardized regression coefficients, you will need to use some other method (e.g. glm)

Having said that, if the explanatory variables are standardized before the fit and glmnet is called with “standardize=FALSE”, then the less important coefficients will be smaller than the more important ones – so you could rank them just by their magnitude. This becomes even more pronounced with non-trivial amount shrinkage (i.e. non-zero lambda)

Hope this helps..

Attribution
Source : Link , Question Author : B_Miner , Answer Author : Yevgeny

Leave a Comment