# Why does the scikit-learn bootstrap function resample the test set?

When using bootstrapping for model evaluation, I always thought the out-of-bag samples were directly used as a test set. However, this appears not to be the case for the deprecated scikit-learn Bootstrap approach, which seems to build the test set from drawing with replacement from the out-of-bag data subset. What is the statistical reasoning behind this? Are there specific scenarios where this technique is better than just evaluating on the out-of-bag-sample or vice versa?

Bootstrap samples are used to evaluate the performance of the algorithm by many iterations. While doing so, the performance on randomly changed sets is evaluated.

In contrast when doing for example 10 Fold Cross Validation you are performing only 10 iterations on different train and test data sets.

Now when your sample size is small, lets say $n=20$ and the number of bootstrap iterations is high, lets choose $i=10,000$, and you do not resample your test data as you do with your train data set you will have situations where your algorithm sees the same or very similar test more than one time. A situation you originally wanted to avoid by using bootstrap.

The link you postet is down, so I added the description of the function in the current (0.14) version of sklearn

Description of the method

Random sampling with replacement cross-validation iterator
Provides train/test indices to split data in train test sets while resampling the input n_iter times: each time a new random split of the data is performed and then samples are drawn (with replacement) on each side of the split to build the training and test sets.
Note: contrary to other cross-validation strategies, bootstrapping will allow some samples to occur several times in each splits. However a sample that occurs in the train split will never occur in the test split and vice-versa.
If you want each sample to occur at most once you should probably use ShuffleSplit cross validation instead.