# Bootstrapping clusters in R

I am running a negative binomial regression of clinic counts in each county in the entire country (~3k counties). I’d like to at least partially account for the non-independence of neighboring counties by bootstrapping the confidence intervals in a “clustered” fashion–e.g. draw an entire state’s (50 states total) worth of data at once. This has become standard practice, for better or for worse, in the econometric literature.

I could write the code to do this myself, but the `boot` package seems like it should have the ability to do this somehow, and in general I prefer tested, general solutions to one-off hacks. Is there a way to coerce the `boot` package to do a clustered bootstrap?

I tried the `strata` argument, but that randomizes within strata rather than randomizing which cluster gets taken, as the following code confirms:

``````dat <- data.frame( cluster=rep(letters[1:5],each=10), x=runif(5*10), stringsAsFactors=TRUE )
boot.stat <- function(dat,idx) {
print(dat[idx,]\$cluster)
print(table(dat[idx,]\$cluster))
mean(dat[idx,]\$x)
}
boot(
data=dat,
statistic=boot.stat,
strata=dat\$cluster,
stype="i",
R=5
)
``````

If I understand you correctly you want to estimate a statistic per state and that average that statistic to get a bootstrapped estimation of the overall statistic.

Stratified sampling does something different. It ensures that the label is samples representatively in each sample. I do not think that is what you want to do.

You could do this manually without being hacky. Using the `dplyr`, `tidyr` and `purrr` package from the tidyverse this becomes transparant and clean code.

``````library(tidyr)
library(dplyr)
library(purrr)

dat <- data.frame(cluster=rep(letters[1:5],each=10),
x=runif(5*10), stringsAsFactors=TRUE)

boot.stat2 <- function(df) {
mean(df\$x)
}

dat %>%
nest(x) %>%
mutate(stat = map_dbl(data, boot.stat2))
``````