There seems to be a lot of confusion in the comparison of using

`glmnet`

within`caret`

to search for an optimal lambda and using`cv.glmnet`

to do the same task.Many questions were posed, e.g.:

Classification model train.glmnet vs. cv.glmnet?

What is the proper way to use glmnet with caret?

Cross-validating `glmnet` using `caret`

but no answer has been given, which might be due to the reproducability of the question.

Following the first question, I give a quite similar example but do have the same question: Why are the estimated lambdas so different?`library(caret) library(glmnet) set.seed(849) training <- twoClassSim(50, linearVars = 2) set.seed(849) testing <- twoClassSim(500, linearVars = 2) trainX <- training[, -ncol(training)] testX <- testing[, -ncol(testing)] trainY <- training$Class # Using glmnet to directly perform CV set.seed(849) cvob1=cv.glmnet(x=as.matrix(trainX),y=trainY,family="binomial",alpha=1, type.measure="auc", nfolds = 3,lambda = seq(0.001,0.1,by = 0.001),standardize=FALSE) cbind(cvob1$lambda,cvob1$cvm) # best parameter cvob1$lambda.mi # best coefficient coef(cvob1, s = "lambda.min") # Using caret to perform CV cctrl1 <- trainControl(method="cv", number=3, returnResamp="all",classProbs=TRUE,summaryFunction=twoClassSummary) set.seed(849) test_class_cv_model <- train(trainX, trainY, method = "glmnet", trControl = cctrl1,metric = "ROC", tuneGrid = expand.grid(alpha = 1,lambda = seq(0.001,0.1,by = 0.001))) test_class_cv_model # best parameter test_class_cv_model$bestTune # best coefficient coef(test_class_cv_model$finalModel, test_class_cv_model$bestTune$lambda)`

To summarise, the optimal lambdas are given as:

0.055 by using

`cv.glmnet()`

0.001 by using

`train()`

I know that using

`standardize=FALSE`

in`cv.glmnet()`

is not advisable, but I really want compare both methods using the same prerequisites. As main explanaition, I think the sampling approach for each fold might be an issue – but I use the same seeds and the results are quite different.So I’m really stuck on why the two approaches are so different, while they should be quite similar? – I hope the community has some idea whats the issue here

**Answer**

I see two issue here. First, your training set is too small relative to your testing set. Normally, we would want a training set that is at least comparable in size to the testing set. Another note is that for Cross Validation, you’re not using the testing set at all, because the algorithm basically creates testing sets for you using the “training set”. So you’d be better off using more of the data as your initial training set.

Second, 3 folds is too small for your CV to be reliable. Typically, 5-10 folds is recommended (`nfolds = 5`

for `cv.glmnet`

and `number=5`

for `caret`

). With these changes, I got the same lambda values across the two methods and almost identical estimates:

```
set.seed(849)
training <- twoClassSim(500, linearVars = 2)
set.seed(849)
testing <- twoClassSim(50, linearVars = 2)
trainX <- training[, -ncol(training)]
testX <- testing[, -ncol(testing)]
trainY <- training$Class
# Using glmnet to directly perform CV
set.seed(849)
cvob1=cv.glmnet(x=as.matrix(trainX), y=trainY,family="binomial",alpha=1,
type.measure="auc", nfolds = 5, lambda = seq(0.001,0.1,by = 0.001),
standardize=FALSE)
cbind(cvob1$lambda,cvob1$cvm)
# best parameter
cvob1$lambda.min
# best coefficient
coef(cvob1, s = "lambda.min")
# Using caret to perform CV
cctrl1 <- trainControl(method="cv", number=5, returnResamp="all",
classProbs=TRUE, summaryFunction=twoClassSummary)
set.seed(849)
test_class_cv_model <- train(trainX, trainY, method = "glmnet",
trControl = cctrl1,metric = "ROC",
tuneGrid = expand.grid(alpha = 1,
lambda = seq(0.001,0.1,by = 0.001)))
test_class_cv_model
# best parameter
test_class_cv_model$bestTune
# best coefficient
coef(test_class_cv_model$finalModel, test_class_cv_model$bestTune$lambda)
```

Result:

```
> cvob1$lambda.min
[1] 0.001
> coef(cvob1, s = "lambda.min")
8 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -0.781015706
TwoFactor1 -1.793387005
TwoFactor2 1.850588656
Linear1 0.009341356
Linear2 -1.213777391
Nonlinear1 1.158009360
Nonlinear2 0.609911748
Nonlinear3 0.246029667
> test_class_cv_model$bestTune
alpha lambda
1 1 0.001
> coef(test_class_cv_model$finalModel, test_class_cv_model$bestTune$lambda)
8 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -0.845792624
TwoFactor1 -1.786976586
TwoFactor2 1.844767690
Linear1 0.008308165
Linear2 -1.212285068
Nonlinear1 1.159933335
Nonlinear2 0.676803555
Nonlinear3 0.309947442
```

**Attribution***Source : Link , Question Author : Jogi , Answer Author : acylam*