How to evaluate the final model after k-fold cross-validation

As this question and its answer pointed out, k-fold cross validation (CV) is used for model selection, e.g. choosing between linear regression and neural network. It’s also suggested that after deciding on which kind of model to use, the final predictor should be trained with the entire data set. Here my question is: how can we evaluate the final predictor? Is it sufficient to just use the average of the k accuracies obtained during k-fold CV?


When training on each fold (90%) of the data, you will then predict on the remaining 10%. With this 10% you will compute an error metric (RMSE, for example). This leaves you with: 10 values for RMSE, and 10 sets of corresponding predictions. There are 2 things to do this these results:

  1. Inspect the mean and standard deviation of your 10 RMSE values. k-fold takes random partitions of your data, and the error on each fold should not vary too greatly. If it does, your model (and its features, hyper-parameters etc.) cannot be expected to yield stable predictions on a test set.

  2. Aggregate your 10 sets of predictions into 1 set of predictions. For example, if your training set contains 1,000 data points, you will have 10 sets of 100 predictions (10*100 = 1000). When you stack these into 1 vector, you are now left with 1000 predictions: 1 for every observation in your original training set. These are called out-of-folds predictions. With these, you can compute the RMSE for your whole training set in one go, as rmse = compute_rmse(oof_predictions, y_train). This is the likely the cleanest way to evaluate the final predictor.

Source : Link , Question Author : qweruiop , Answer Author : cavaunpeu

Leave a Comment