How can I obtain a confidence interval for the difference between two probabilities from a binary logistic regression model?

I saw this question here, but it doesn’t have a clear answer.

Suppose I have a simple logistic regression model with binary x:


Then I know:




So if x=0, then the model becomes:


and if x=1, then the model becomes:


To obtain a confidence interval for p when x=0, I did this:

model <- glm(y~., family=binomial(), data)

#For x=0
Bigma = vcov(model)
sig = sqrt(Bigma[1,1])

logit_p = coef(model)[1][[1]] #The intercept

theta_L = logit_p - 1.96*sig
theta_U = logit_p + 1.96*sig

p_L = plogis(theta_L) 
p_U = plogis(theta_U)

The confidence interval is (p_L, p_U).

Then a confidence interval for p when x=1:

sig_x1 = sqrt(Bigma[1,1] + Bigma[2,2] + 2*Bigma[1,2])

logit_p_x1 = coef(model)[1][[1]] + coef(model)[2][[1]] #beta_0 + beta_1

theta_L_x1 = logit_p_x1 - 1.96*sig_x1
theta_U_x1 = logit_p_x1 + 1.96*sig_x1

p_L_x1 = plogis(theta_L_x1) 
p_U_x1 = plogis(theta_U_x1)

The confidence interval is (p_L_x1, p_U_x1).

Now I would like a confidence interval for the difference in probability of success when x=0 and x=1.

I can obtain the point estimate of the difference:


I know the next step is to compute the standard error of the difference, but I don’t know how to do this.

Question: What is the formula and R code for a confidence interval for the difference in the two probabilities when x=0 and when x=1?


The delta method states


Because this problem involves two parameters, we can extend this to the multivariate delta method




and Σ is the variance covariance matrix from your model. g is…gross. I’m not going to do that by hand, and computer algebra while fast yields a mess of symbols. You can however use autodifferentiation compute the gradient. Once you calculate the variance, then its simply your estimate of the difference in probs plus/minus 1.96 times the standard deviation (root of the variance). Caution, this approach will yield answers below 0 or above 1.

We can do this in R in the following way (note you need to install the autodiffr package).


g = function(b)  (exp(b[1] + b[2]) - exp(b[1])) / ((1+ exp(b[1] + b[2]))*(1+exp(b[1])))

x = rbinom(100, 1, 0.5)
eta = -0.8 + 0.2*x
p = plogis(eta)
y = rbinom(100, 1, p)

model = glm(y~x, family=binomial())
Bigma = vcov(model)

grad_g = makeGradFunc(g)
nabla_g = grad_g(coef(model))

se = as.numeric(sqrt(nabla_g %*% Bigma %*% nabla_g))

estimate = diff(predict(model, newdata=list(x=c(0, 1)), type='response'))

estimate + c(-1, 1)*1.96*se

Repeating this procedure for this modest example shows that the resulting confidence interval has near nominal coverage, which is a good thing, but I imagine things would become worse as the probabilities approach 0 or 1.

Source : Link , Question Author : StatsSorceress , Answer Author : Demetri Pananos

Leave a Comment