Can glmnet logistic regression directly handle factor (categorical) variables without needing dummy variables? [closed]

I’m building a logistic regression in R using LASSO method with the functions cv.glmnet for selecting the lambda and glmnet for the final model.

I already know all the disadvantages regarding the automatic model selection but I need to do it anyway.

My problem is that I need to include factor (categorical) variables in the model, is there any way to do it without creating a lot of dummy variables? This variables are almost all strings and not numbers.

Answer

glmnet cannot take factor directly, you need to transform factor variables to dummies. It is only one simple step using model.matrix, for instance:

x_train <- model.matrix( ~ .-1, train[,features])
lm = cv.glmnet(x=x_train,y = as.factor(train$y), intercept=FALSE ,family =   "binomial", alpha=1, nfolds=7)
best_lambda <- lm$lambda[which.min(lm$cvm)]

alpha=1 will build a LASSO.

Attribution
Source : Link , Question Author : Dan , Answer Author : Romain

Leave a Comment