Bootstrap methodology. Why resample “with replacement” instead of random subsampling?

The bootstrap method has seen a great diffusion in the last years, I also use it a lot, especially because the reasoning behind is quite intuitive.

But that’s one thing I don’t understand. Why Efron chose to perform resample with replace instead of simply subsampling by randomly including or excluding single observations?

I think that random subsampling has one very good quality, that is represent ideally the real life situation in which the observations we have in our study are a subset of an hypothetical population. I don’t see the advantage of having multiplied observations during resampling. In a real context no observation is similar to another, especially for complex multivariate situations.


One way to understand this choice is to think of the sample at hand as being the best representation you have of the underlying population. You may not have the whole population to sample from any more, but you do have this particular representation of the population. A truly random re-sample from this representation of the population means that you must sample with replacement, otherwise your later sampling would depend on the results of your initial sampling. The presence of a repeated case in a particular bootstrap sample represents members of the underlying population that have characteristics close to those of that particular repeated case. Leave-one-out or leave-several-out approaches, as you suggest, can also be used but that’s cross validation rather than bootstrapping.

I think this pretty much just puts into other words the comment from @kjetil_b_halvorsen

Source : Link , Question Author : Bakaburg , Answer Author : EdM

Leave a Comment