How can I obtain a confidence interval for the difference between two probabilities from a binary logistic regression model?

I saw this question here, but it doesn’t have a clear answer.

Suppose I have a simple logistic regression model with binary x:

log(p/(1p))=β0+β1x

Then I know:

p/(1p)=eβ0+β1x

and

p=eβ0+β1x/(1+eβ0+β1x)

So if x=0, then the model becomes:

log(p/(1p))=β0

and if x=1, then the model becomes:

log(p/(1p))=β0+β1

To obtain a confidence interval for p when x=0, I did this:

model <- glm(y~., family=binomial(), data)

#For x=0
Bigma = vcov(model)
sig = sqrt(Bigma[1,1])

logit_p = coef(model)[1][[1]] #The intercept

theta_L = logit_p - 1.96*sig
theta_U = logit_p + 1.96*sig

p_L = plogis(theta_L) 
p_U = plogis(theta_U)

The confidence interval is (p_L, p_U).

Then a confidence interval for p when x=1:

sig_x1 = sqrt(Bigma[1,1] + Bigma[2,2] + 2*Bigma[1,2])

logit_p_x1 = coef(model)[1][[1]] + coef(model)[2][[1]] #beta_0 + beta_1

theta_L_x1 = logit_p_x1 - 1.96*sig_x1
theta_U_x1 = logit_p_x1 + 1.96*sig_x1

p_L_x1 = plogis(theta_L_x1) 
p_U_x1 = plogis(theta_U_x1)

The confidence interval is (p_L_x1, p_U_x1).

Now I would like a confidence interval for the difference in probability of success when x=0 and x=1.

I can obtain the point estimate of the difference:

px1px0=[eβ0+β1eβ0]/[(1+eβ0+β1)(1+eβ0)]

I know the next step is to compute the standard error of the difference, but I don’t know how to do this.

Question: What is the formula and R code for a confidence interval for the difference in the two probabilities when x=0 and when x=1?

Answer

The delta method states

Var(g(X))=[g(X)]2Var(X)

Because this problem involves two parameters, we can extend this to the multivariate delta method

=gTΣg

Here,

g=[eβ0+β1eβ0]/[(1+eβ0+β1)(1+eβ0)]

and Σ is the variance covariance matrix from your model. g is…gross. I’m not going to do that by hand, and computer algebra while fast yields a mess of symbols. You can however use autodifferentiation compute the gradient. Once you calculate the variance, then its simply your estimate of the difference in probs plus/minus 1.96 times the standard deviation (root of the variance). Caution, this approach will yield answers below 0 or above 1.

We can do this in R in the following way (note you need to install the autodiffr package).

library(autodiffr)

g = function(b)  (exp(b[1] + b[2]) - exp(b[1])) / ((1+ exp(b[1] + b[2]))*(1+exp(b[1])))

x = rbinom(100, 1, 0.5)
eta = -0.8 + 0.2*x
p = plogis(eta)
y = rbinom(100, 1, p)

model = glm(y~x, family=binomial())
Bigma = vcov(model)

grad_g = makeGradFunc(g)
nabla_g = grad_g(coef(model))


se = as.numeric(sqrt(nabla_g %*% Bigma %*% nabla_g))


estimate = diff(predict(model, newdata=list(x=c(0, 1)), type='response'))

estimate + c(-1, 1)*1.96*se

Repeating this procedure for this modest example shows that the resulting confidence interval has near nominal coverage, which is a good thing, but I imagine things would become worse as the probabilities approach 0 or 1.

Attribution
Source : Link , Question Author : StatsSorceress , Answer Author : Demetri Pananos

Leave a Comment