# Use of standard error of bootstrap distribution

(ignore the R code if needed, as my main question is language-independent)

If I want to look at the variability of a simple statistic (ex: mean), I know I can do it via theory like:

x = rnorm(50)

# Estimate standard error from theory
summary(lm(x~1))
# same as...
sd(x) / sqrt(length(x))


or with the bootstrap like:

library(boot)

# Estimate standard error from bootstrap
(x.bs = boot(x, function(x, inds) mean(x[inds]), 1000))
# which is simply the standard *deviation* of the bootstrap distribution...
sd(x.bs$t)  However, what I’m wondering is, can it be useful/valid(?) to look to the standard error of a bootstrap distribution in certain situations? The situation I’m dealing with is a relatively noisy nonlinear function, such as: # Simulate dataset set.seed(12345) n = 100 x = runif(n, 0, 20) y = SSasymp(x, 5, 1, -1) + rnorm(n, sd=2) dat = data.frame(x, y)  Here the model doesn’t even converge using the original data set, > (fit = nls(y ~ SSasymp(x, Asym, R0, lrc), dat)) Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model  so the statistics I’m interested in instead are more stabilized estimates of these nls parameters – perhaps their means across a number of bootstrap replications. # Obtain mean bootstrap nls parameter estimates fit.bs = boot(dat, function(dat, inds) tryCatch(coef(nls(y ~ SSasymp(x, Asym, R0, lrc), dat[inds, ])), error=function(e) c(NA, NA, NA)), 100) pars = colMeans(fit.bs$t, na.rm=T)


Here these are, indeed, in the ball park of what I used to simulate the original data:

> pars
[1]  5.606190  1.859591 -1.390816


A plotted version looks like:

# Plot
with(dat, plot(x, y))

newx = seq(min(x), max(x), len=100)
lines(newx, SSasymp(newx, pars[1], pars[2], pars[3]))

lines(newx, SSasymp(newx, 5, 1, -1), col='red')
legend('bottomright', c('Actual', 'Predicted'), bty='n', lty=1, col=2:1)


Now, if I want the variability of these stabilized parameter estimates, I think I can, assuming normality of this bootstrap distribution, just calculate their standard errors:

> apply(fit.bs\$t, 2, function(x) sd(x, na.rm=T) / sqrt(length(na.omit(x))))
[1] 0.08369921 0.17230957 0.08386824


Is this a sensible approach? Is there a better general approach to inference on the parameters of unstable nonlinear models like this? (I suppose I could instead do a second layer of resampling here, instead of relying on theory for the last bit, but that might take a lot of time depending on the model. Even still, I’m not sure if these standard errors would be useful for anything, since they would approach 0 if I just increase the number of bootstrap replications.)

Many thanks, and, by the way, I’m an engineer so please forgive me being a relative novice around here.

There are several problems in this question. First, there is the question of whether bootstrapped averages will be sensible estimators even when some of the individual bootstrapped estimators are not computable (lack of convergence, non-existence of solutions). Second, given that the bootstrapped estimators are sensible, there is a question of how to obtain confidence intervals or perhaps just standard errors for these estimates.

The idea of averaging bootstrapped estimates is closely related to, if not actually the same as, bootstrap aggregation, or bagging, used in machine learning to improve prediction performance of weak predictors. See ESL, Section 8.7. In certain cases $-$ also for estimating parameters $-$ the averaging of bootstrap estimates may reduce the variance of the resulting estimator compared to just using the estimator on the original data set.

The purpose in the question is, however, to produce estimates even in cases where the algorithm for computing the estimates may fail occasionally or where the estimator is occasionally undefined. As a general approach there is a problem:

• Averaging bootstrapped estimates while blindly throwing away the bootstrapped samples for which the estimates are not computable will in general give biased results.

How severe the general problem is depends on several things. For instance, how frequently the estimate is not computable and whether the conditional distribution of the sample given that the estimate is not computable differs from the conditional distribution of the sample given that the estimate is computable. I would not recommend to use the method.

For the second part of the question we need a little notation. If $X$ denotes our original data set, $\hat{\theta}$ our estimator (assume for simplicity it is real valued and allowed to take the value NA) such that $\hat{\theta}(X)$ is the estimate for the original data set, and $Y$ denotes a single bootstrapped sample then the bootstrap averaging is effectively computing the estimator

where $A(X)$ denotes the event, depending on $X$, upon which $\hat{\theta}(Y) \neq \text{NA}$. That is, we compute the conditional expectation of the estimator on a bootstrapped sample $-$ conditioning on the original sample $X$ and the event, $A(X)$, that the estimator is computable for the bootstrapped sample. The actual bootstrap computation is a sampling based approximation of $\tilde{\theta}(X)$.

The suggestion in the question is to compute the empirical standard deviation of the bootstrapped estimators, which is an estimate of the standard deviation of $\hat{\theta}(Y)$ conditionally on $X$ and $A(X)$. The desired standard deviation, the standard error, is the standard deviation of $\tilde{\theta}(X)$. You can’t get the latter from the former. I see no other obvious and general way than to use a second layer of bootstrapping for obtaining a reliable estimate of the standard error.

The discussion on the estimation of the standard error is independent of how the conditioning on $A(X)$ affects the bias of the estimator $\tilde{\theta}(X)$. If the effect is severe then even with correct estimates of the standard error, a confidence interval will be misleading.

Edit:

The very nice paper Estimation and Accuracy After Model Selection by Efron gives a general method for estimating the standard error of a bagged estimator without using a second layer of bootstrapping. The paper does not deal explicitly with estimators that are occasionally not computable.