Why is bootstrapping useful?

If all you are doing is re-sampling from the empirical distribution, why not just study the empirical distribution? For example instead of studying the variability by repeated sampling, why not just quantify the variability from the empirical distribution?


Bootstrapping (or other resampling) is an experimental method to estimate the distribution of a statistic.

It is a very straightforward and easy method (it just means you compute with many random variants of the sample data in order to obtain, an estimate of, the desired distribution of the statistic).

You most likely use it when the ‘theoretical/analytical’ expression is too difficult to obtain/calculate (or like aksakal says sometimes they are unknown).

  • Example 1: If you do a pca analysis and wish to compare the results with ‘estimates of the deviation of the eigenvalues’ given the hypothesis that there is no correlation in the variables.

You could, scramble the data many times and re-computing the pca eigenvalues such that you get a distribution (based on random tests with the sample data) for the eigenvalues.

Note that the current practices are gazing at a scree plot and apply rules of thumb in order to ‘decide’ whether a certain eigenvalue is significant/important or not.

  • Example 2: You did a non-linear regression y ~ f(x) providing you with some estimate of bunch of parameters for the function f. Now you wish to know the standard error for those parameters.

Some simple look at the residuals and linear algebra, like in OLS, is not possible here. However, an easy way is to compute the same regression many times with the residuals/errors re-scrambled in order to get an idea how the parameters would vary (given the distribution for the error term can be modeled by the observed residuals).

Source : Link , Question Author : ztyh , Answer Author : Sextus Empiricus

Leave a Comment