# Restricting minimum subgroup size in a bootstrap resampling study – why is this approach wrong?

I’m currently doing a simple re-sampling study where I compare different methods for generating the confidence interval for linear regression models. I’m trying to follow Burton et. al’s (2006) recommendation and I’m mostly interested in how well the confidence interval covers the true value (the coverage) from my original sample.

Since a few of the confidence intervals are based upon the bootstrap procedure I figured, based upon this CV-question, that it would not be fair to use bootstrap re-samples where one of the categorical variables has a category with less than 8 individuals. I therefore re-sample if I get a sample that contains that criteria. One of the statisticians that I co-operate with tells me that this procedure violates the distributional assumption and that this is bad practice. Unfortunately his arguments have not convinced me and I therefore hope that someone here can aid in explaining the flaw in my logic.

In my mind I would never use bootstrapping for a sample that for instance has only 2 women and 23 men in it as the two women couldn’t possibly contribute enough to generate reasonable randomness from the bootstrap procedure. I’m aware that I can’t say anything about the samples that I’ve discarded but why is it wrong to say something about the samples that satisfy the criteria?

### Minor update

After Bill’s interesting answer I just want to clarify a little. The original sample has “enough” in each category to allow for drawn samples to contain at least 8 individuals in each category. The boostrap stat-function is always the same (I use the R `boot`-package):

``````#' Gets the coefficients for the bootstrap function
#'
#' This function also catches errors that may
#' occur in the lm() call and then returns a NA vector.
#' As I lack controle over the data this is necessary
#' since there can be unexpected problems with the
#' lm() call.
#'
#' @param formula The lm() formula
#' @param data The data set that is to be bootstrapped
#' @param indices The indices that the bootstrap
#'  provides the function see the library boot()
#'  function for details. If the function is used for
#'  the asymptotic confidence interval then the indices
#'  should be provided as a simple sequence - 1:nrow(data)
#' @param true_vals The true values. Only used so that
#'  the function can return a result that contains the
#'  same number of estimates even if some coefficients
#'  were missing for this bootstrapped data set
#' @return vector Returns the coefficients as a
#'  vector, in case of missing those are marked as NA
#'  at the apropriate positions.
bootstrap_lm <- function(data, indices, formula, true_vals) {
d <- data[indices,]

fit <- try(lm(formula, data=d), silent=TRUE)
if ("try-error" %in% class(fit))
return(rep(NA, times=length(true_vals)))

ret <- coef(fit)

# Sometimes not all options get an estimate
# as the bootstrap might be without those variables
# therefore we need to generate empty values
if (length(true_vals) > length(ret)){
empty <- rep(NA, times=length(true_vals))
names(empty) <- names(true_vals)
empty[which(names(empty) %in% names(ret))] <- ret
ret <- empty
}

return(ret)
}
``````

As @jbowman commented in his answer – it is not of interest to compare a situation where one of the confidence intervals will by definition not work. I’m comparing regular, robust and bootstrapped confidence intervals and I want the comparison to be as fair as possible. Comparing on samples with less < 8 in one category should in my mind lead to a bias in favor of the regular and robust confidence intervals.

I guess I’ll have to go by the conventional methodology and skip this criteria or I wont get published – it seems though that if that is the case then the statistical community will have articles that are biased against bootstrapping (in comparative studies like mine). If someone out there has a good paper that can support my approach I would greatly appreciate it.

This is an interesting question (+1). It’s strange that you have gotten no attention.

I’m no bootstrapping expert, but I think the answer is to go back to the principles of bootstrapping. What you are supposed to do on each bootstrap replication is 1) draw a bootstrap sample in a way which imitates (in a way which preserves independence) the way you drew the original sample, then 2) do to that bootstrap sample whatever your estimation technique calls for, then 3) record the outcome of the estimation.

To answer the question, I think you need to think carefully about how the original sample was collected (so that your bootstrapping properly imitates it). Also, you need to think about what your estimation technique really was/is.

Suppose you were collecting your original data. Suppose you came to the end of the data collection. Suppose you notice that you only have two females. What would you have done? If the answer to this question is “I would have thrown away my entire dataset and done the whole data collection process again,” then your bootstrapping procedure is exactly right.

I doubt that this is what you would have done, however. And this is the answer to your question. This is why what you are doing is wrong.

Maybe you would have continued collecting more data until you had eight females. If so, then imitate that in the the bootstrap sampling step (1). Maybe you would have decided that two females is too few females and you would have dropped all females from the analysis. If so, then imitate that in the bootstrap estimation step (2).

Another way to say this is that you should think about what question you want the boostrapping procedure to answer. If you want to answer the question “How often would the confidence intervals cover the true parameter value if I did the experiment over and over, mindlessly running the exact same regression each time without any attention to what the sample looked like,” then just mindlessly bootstrap like your colleague is telling you to. If you want to answer the question “How often would the confidence intervals cover the true parameter value if I did the experiment over and over, analyzing the data the way Max Gordon would analyze it,” then do what I suggested.

If you want to get this work published, do the conventional thing: the thing your colleague is suggesting that you do. Well, unless you can find a paper in Biometrika which agrees with what I say above. I don’t know the relevant literature, unfortunately, so I can’t help you with that.