# How to: Prediction intervals for linear regression via bootstrapping

I am having trouble to understand how to use bootstrapping to calculate prediction intervals for a linear regression model. Can somebody outline a step-by-step procedure? I searched via google but nothing really makes sense to me.

I do understand how to use bootstrapping for calculating confidence intervals for the model parameters.

## Answer

Confidence intervals take account of the estimation uncertainty. Prediction intervals add to this the fundamental uncertainty. R’s `predict.lm` will give you the prediction interval for a linear model. From there, all you have to do is run it repeatedly on bootstrapped samples.

``````n <- 100
n.bs <- 30

dat <- data.frame( x<-runif(n), y=x+runif(n) )
plot(y~x,data=dat)

regressAndPredict <- function( dat ) {
model <- lm( y~x, data=dat )
predict( model, interval="prediction" )
}

regressAndPredict(dat)

replicate( n.bs, regressAndPredict(dat[ sample(seq(n),replace=TRUE) ,]) )
``````

The result of `replicate` is a 3-dimensional array (`n` x `3` x `n.bs`). The length 3 dimension consists of the fitted value for each data element, and the lower/upper bounds of the 95% prediction interval.

Gary King method

Depending on what you want, there’s a cool method by King, Tomz, and Wittenberg. It’s relatively easy to implement, and avoids the problems of bootstrapping for certain estimates (e.g. `max(Y)`).

I’ll quote from his definition of fundamental uncertainty here, since it’s reasonably nice:

A second form of variability, the fundamental un- certainty
represented by the stochastic component (the distribution f ) in
Equation 1, results from innumerable chance events such as weather or
illness that may influ- ence Y but are not included in X. Even if we
knew the ex- act values of the parameters (thereby eliminating esti-
mation uncertainty), fundamental uncertainty would prevent us from
predicting Y without error.

Attribution
Source : Link , Question Author : Max , Answer Author : Ari B. Friedman