# How can I obtain a confidence interval for the difference between two probabilities from a binary logistic regression model?

I saw this question here, but it doesn’t have a clear answer.

Suppose I have a simple logistic regression model with binary $$x:x:$$

$$log(p/(1−p))=β0+β1x\log(p/(1-p)) = \beta_0 + \beta_1x$$

Then I know:

$$p/(1−p)=eβ0+β1xp/(1-p) = e^{\beta_0 + \beta_1x}$$

and

$$p=eβ0+β1x/(1+eβ0+β1x)p = e^{\beta_0 + \beta_1x}/(1 + e^{\beta_0 + \beta_1x})$$

So if $$x=0,x=0,$$ then the model becomes:

$$log(p/(1−p))=β0\log(p/(1-p)) = \beta_0$$

and if $$x=1,x=1,$$ then the model becomes:

$$log(p/(1−p))=β0+β1\log(p/(1-p)) = \beta_0 + \beta_1$$

To obtain a confidence interval for $$pp$$ when $$x=0,x=0,$$ I did this:

model <- glm(y~., family=binomial(), data)

#For x=0
Bigma = vcov(model)
sig = sqrt(Bigma[1,1])

logit_p = coef(model)[] #The intercept

theta_L = logit_p - 1.96*sig
theta_U = logit_p + 1.96*sig

p_L = plogis(theta_L)
p_U = plogis(theta_U)


The confidence interval is (p_L, p_U).

Then a confidence interval for p when x=1:

sig_x1 = sqrt(Bigma[1,1] + Bigma[2,2] + 2*Bigma[1,2])

logit_p_x1 = coef(model)[] + coef(model)[] #beta_0 + beta_1

theta_L_x1 = logit_p_x1 - 1.96*sig_x1
theta_U_x1 = logit_p_x1 + 1.96*sig_x1

p_L_x1 = plogis(theta_L_x1)
p_U_x1 = plogis(theta_U_x1)


The confidence interval is (p_L_x1, p_U_x1).

Now I would like a confidence interval for the difference in probability of success when $$x=0x=0$$ and $$x=1.x=1.$$

I can obtain the point estimate of the difference:

$$px1−px0=[eβ0+β1−eβ0]/[(1+eβ0+β1)(1+eβ0)]p_{x_1} - p_{x_0} = [e^{\beta_0 + \beta_1} - e^{\beta_0}]/[(1+e^{\beta_0 + \beta_1})(1+e^{\beta_0})]$$

I know the next step is to compute the standard error of the difference, but I don’t know how to do this.

Question: What is the formula and R code for a confidence interval for the difference in the two probabilities when $$x=0x=0$$ and when $$x=1?x=1?$$

The delta method states

$$Var(g(X))=[g′(X)]2Var(X) \operatorname{Var}(g(X)) = [g'(X)]^2 \operatorname{Var}(X)$$

Because this problem involves two parameters, we can extend this to the multivariate delta method

$$=∇gTΣ∇g =\nabla g^T \, \Sigma \, \nabla g$$

Here,

$$g=[eβ0+β1−eβ0]/[(1+eβ0+β1)(1+eβ0)] g = \left[e^{\beta_{0}+\beta_{1}}-e^{\beta_{0}}\right] /\left[\left(1+e^{\beta_{0}+\beta_{1}}\right)\left(1+e^{\beta_{0}}\right)\right]$$

and $$Σ\Sigma$$ is the variance covariance matrix from your model. $$∇g\nabla g$$ is…gross. I’m not going to do that by hand, and computer algebra while fast yields a mess of symbols. You can however use autodifferentiation compute the gradient. Once you calculate the variance, then its simply your estimate of the difference in probs plus/minus 1.96 times the standard deviation (root of the variance). Caution, this approach will yield answers below 0 or above 1.

We can do this in R in the following way (note you need to install the autodiffr package).

library(autodiffr)

g = function(b)  (exp(b + b) - exp(b)) / ((1+ exp(b + b))*(1+exp(b)))

x = rbinom(100, 1, 0.5)
eta = -0.8 + 0.2*x
p = plogis(eta)
y = rbinom(100, 1, p)

model = glm(y~x, family=binomial())
Bigma = vcov(model)

se = as.numeric(sqrt(nabla_g %*% Bigma %*% nabla_g))

estimate = diff(predict(model, newdata=list(x=c(0, 1)), type='response'))

estimate + c(-1, 1)*1.96*se



Repeating this procedure for this modest example shows that the resulting confidence interval has near nominal coverage, which is a good thing, but I imagine things would become worse as the probabilities approach 0 or 1.