I want to optimize hyperparameters of XGboost using crossvalidation. However, it is not clear how to obtain the model from

`xgb.cv`

.

For instance I call`objective(params)`

from`fmin`

. Then model is fitted on`dtrain`

and validated on`dvalid`

. What if I want to use KFold crossvalidation instead of training on`dtrain`

?`from hyperopt import fmin, tpe import xgboost as xgb params = { 'n_estimators' : hp.quniform('n_estimators', 100, 1000, 1), 'eta' : hp.quniform('eta', 0.025, 0.5, 0.025), 'max_depth' : hp.quniform('max_depth', 1, 13, 1) #... } best = fmin(objective, space=params, algo=tpe.suggest) def objective(params): dtrain = xgb.DMatrix(X_train, label=y_train) dvalid = xgb.DMatrix(X_valid, label=y_valid) watchlist = [(dtrain, 'train'), (dvalid, 'eval')] model = xgb.train(params, dtrain, num_boost_round, evals=watchlist, feval=myFunc) # xgb.cv(param, dtrain, num_boost_round, nfold = 5, seed = 0, # feval=myFunc)`

**Answer**

This is how I have trained a `xgboost`

classifier with a 5-fold cross-validation to optimize the F1 score using randomized search for hyperparameter optimization.

Note that `X`

and `y`

here should be pandas dataframes.

```
from scipy import stats
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.metrics import f1_score
clf_xgb = XGBClassifier(objective = 'binary:logistic')
param_dist = {'n_estimators': stats.randint(150, 500),
'learning_rate': stats.uniform(0.01, 0.07),
'subsample': stats.uniform(0.3, 0.7),
'max_depth': [3, 4, 5, 6, 7, 8, 9],
'colsample_bytree': stats.uniform(0.5, 0.45),
'min_child_weight': [1, 2, 3]
}
clf = RandomizedSearchCV(clf_xgb, param_distributions = param_dist, n_iter = 25, scoring = 'f1', error_score = 0, verbose = 3, n_jobs = -1)
numFolds = 5
folds = KFold(n_splits = numFolds, shuffle = True)
estimators = []
results = np.zeros(len(X))
score = 0.0
for train_index, test_index in folds.split(X):
X_train, X_test = X.iloc[train_index,:], X.iloc[test_index,:]
y_train, y_test = y.iloc[train_index].values.ravel(), y.iloc[test_index].values.ravel()
clf.fit(X_train, y_train)
estimators.append(clf.best_estimator_)
results[test_index] = clf.predict(X_test)
score += f1_score(y_test, results[test_index])
score /= numFolds
```

At the end, you get a list of trained classifiers in `estimators`

, a prediction for the entire dataset in `results`

constructed from out-of-fold predictions, and an estimate for the $F_1$ score in `score`

.

**Attribution***Source : Link , Question Author : Klausos , Answer Author : Matt Wenham*