I would like to obtain 95% confidence intervals on the predictions of a non-linear mixed

`nlme`

model. As nothing standard is provided to do this within`nlme`

, I was wondering if it is correct to use the method of “population prediction intervals”, as outlined in Ben Bolker’s book chapter in the context of models fit with maximum likelihood, based on the idea of resampling fixed effect parameters based on the variance-covariance matrix of the fitted model, simulating predictions based on this, and then taking the 95% percentiles of these predictions to get the 95% confidence intervals?The code to do this looks as follows :

(I here use the ‘Loblolly’ data from the`nlme`

help file)`library(effects) library(nlme) library(MASS) fm1 <- nlme(height ~ SSasymp(age, Asym, R0, lrc), data = Loblolly, fixed = Asym + R0 + lrc ~ 1, random = Asym ~ 1, start = c(Asym = 103, R0 = -8.5, lrc = -3.3)) xvals=seq(min(Loblolly$age),max(Loblolly$age),length.out=100) nresamp=1000 pars.picked = mvrnorm(nresamp, mu = fixef(fm1), Sigma = vcov(fm1)) # pick new parameter values by sampling from multivariate normal distribution based on fit yvals = matrix(0, nrow = nresamp, ncol = length(xvals)) for (i in 1:nresamp) { yvals[i,] = sapply(xvals,function (x) SSasymp(x,pars.picked[i,1], pars.picked[i,2], pars.picked[i,3])) } quant = function(col) quantile(col, c(0.025,0.975)) # 95% percentiles conflims = apply(yvals,2,quant) # 95% confidence intervals`

Now that I have my confidence limits I create a graph:

`meany = sapply(xvals,function (x) SSasymp(x,fixef(fm1)[[1]], fixef(fm1)[[2]], fixef(fm1)[[3]])) par(cex.axis = 2.0, cex.lab=2.0) plot(0, type='n', xlim=c(3,25), ylim=c(0,65), axes=F, xlab="age", ylab="height"); axis(1, at=c(3,1:5 * 5), labels=c(3,1:5 * 5)) axis(2, at=0:6 * 10, labels=0:6 * 10) for(i in 1:14) { data = subset(Loblolly, Loblolly$Seed == unique(Loblolly$Seed)[i]) lines(data$age, data$height, col = "red", lty=3) } lines(xvals,meany, lwd=3) lines(xvals,conflims[1,]) lines(xvals,conflims[2,])`

Here’s the plot with the 95% confidence intervals obtained this way:

Is this approach valid, or are there any other or better approaches to calculate 95% confidence intervals on the predictions of a nonlinear mixed model? I am not entirely sure of how to deal with the random effect stucture of model… Should one average perhaps over random effect levels? Or would it be OK to have confidence intervals for an average subject, which would seem to be closer to what I have now?

**Answer**

What you’ve done here looks reasonable. The short answer is that for the most part the issues of predicting confidence intervals from mixed models and from nonlinear models are more or less *orthogonal*, that is, you need to worry about both sets of problems, but they don’t (that I know of) interact in any strange ways.

*Mixed model issues*: are you trying to predict at the population or the group level? How do you account for variability in the random-effects parameters? Are you conditioning on the group-level observations or not?*Nonlinear model issues*: is the sampling distribution of the parameters Normal? How do I account for nonlinearity when propagating error?

Throughout, I will assume you’re predicting at the population level and constructing confidence intervals as the population level – in other words you’re trying to plot the predicted values of a *typical* group, and not including the among-group variation in your confidence intervals. This simplifies the mixed-model issues. The following plots compare three approaches (see below for code dump):

*population prediction intervals*: this is the approach you tried above. It assumes the model is correct and that the sampling distributions of the fixed-effect parameters are multivariate Normal; it also ignores uncertainty in the random-effects parameters*bootstrapping*: I implemented hierarchical bootstrapping; we resample both at the level of groups and within groups. The within-group sampling samples the*residuals*and adds them back to the predictions. This approach makes the fewest assumptions.*delta method*: this assumes both multivariate Normality of sampling distributions and that the nonlinearity is weak enough to allow a second-order approximation.

We could also do *parametric* bootstrapping …

Here are the CIs plotted along with the data …

… but we can hardly see the differences.

Zooming in by subtracting off the predicted values (red=bootstrap, blue=PPI, cyan=delta method)

In this case the bootstrap intervals are actually narrowest (e.g. presumably the sampling distributions of the parameters are actually slightly *thinner-tailed* than Normal), while the PPI and delta-method intervals are very similar to each other.

```
library(nlme)
library(MASS)
fm1 <- nlme(height ~ SSasymp(age, Asym, R0, lrc),
data = Loblolly,
fixed = Asym + R0 + lrc ~ 1,
random = Asym ~ 1,
start = c(Asym = 103, R0 = -8.5, lrc = -3.3))
xvals <- with(Loblolly,seq(min(age),max(age),length.out=100))
nresamp <- 1000
## pick new parameter values by sampling from multivariate normal distribution based on fit
pars.picked <- mvrnorm(nresamp, mu = fixef(fm1), Sigma = vcov(fm1))
## predicted values: useful below
pframe <- with(Loblolly,data.frame(age=xvals))
pframe$height <- predict(fm1,newdata=pframe,level=0)
## utility function
get_CI <- function(y,pref="") {
r1 <- t(apply(y,1,quantile,c(0.025,0.975)))
setNames(as.data.frame(r1),paste0(pref,c("lwr","upr")))
}
set.seed(101)
yvals <- apply(pars.picked,1,
function(x) { SSasymp(xvals,x[1], x[2], x[3]) }
)
c1 <- get_CI(yvals)
## bootstrapping
sampfun <- function(fitted,data,idvar="Seed") {
pp <- predict(fitted,levels=1)
rr <- residuals(fitted)
dd <- data.frame(data,pred=pp,res=rr)
## sample groups with replacement
iv <- levels(data[[idvar]])
bsamp1 <- sample(iv,size=length(iv),replace=TRUE)
bsamp2 <- lapply(bsamp1,
function(x) {
## within groups, sample *residuals* with replacement
ddb <- dd[dd[[idvar]]==x,]
## bootstrapped response = pred + bootstrapped residual
ddb$height <- ddb$pred +
sample(ddb$res,size=nrow(ddb),replace=TRUE)
return(ddb)
})
res <- do.call(rbind,bsamp2) ## collect results
if (is(data,"groupedData"))
res <- groupedData(res,formula=formula(data))
return(res)
}
pfun <- function(fm) {
predict(fm,newdata=pframe,level=0)
}
set.seed(101)
yvals2 <- replicate(nresamp,
pfun(update(fm1,data=sampfun(fm1,Loblolly,"Seed"))))
c2 <- get_CI(yvals2,"boot_")
## delta method
ss0 <- with(as.list(fixef(fm1)),SSasymp(xvals,Asym,R0,lrc))
gg <- attr(ss0,"gradient")
V <- vcov(fm1)
delta_sd <- sqrt(diag(gg %*% V %*% t(gg)))
c3 <- with(pframe,data.frame(delta_lwr=height-1.96*delta_sd,
delta_upr=height+1.96*delta_sd))
pframe <- data.frame(pframe,c1,c2,c3)
library(ggplot2); theme_set(theme_bw())
ggplot(Loblolly,aes(age,height))+
geom_line(alpha=0.2,aes(group=Seed))+
geom_line(data=pframe,col="red")+
geom_ribbon(data=pframe,aes(ymin=lwr,ymax=upr),colour=NA,alpha=0.3,
fill="blue")+
geom_ribbon(data=pframe,aes(ymin=boot_lwr,ymax=boot_upr),
colour=NA,alpha=0.3,
fill="red")+
geom_ribbon(data=pframe,aes(ymin=delta_lwr,ymax=delta_upr),
colour=NA,alpha=0.3,
fill="cyan")
ggplot(Loblolly,aes(age))+
geom_hline(yintercept=0,lty=2)+
geom_ribbon(data=pframe,aes(ymin=lwr-height,ymax=upr-height),
colour="blue",
fill=NA)+
geom_ribbon(data=pframe,aes(ymin=boot_lwr-height,ymax=boot_upr-height),
colour="red",
fill=NA)+
geom_ribbon(data=pframe,aes(ymin=delta_lwr-height,ymax=delta_upr-height),
colour="cyan",
fill=NA)
```

**Attribution***Source : Link , Question Author : Piet van den Berg , Answer Author : Ben Bolker*