I saw this question here, but it doesn’t have a clear answer.

Suppose I have a simple logistic regression model with binary x:

log(p/(1−p))=β0+β1x

Then I know:

p/(1−p)=eβ0+β1x

and

p=eβ0+β1x/(1+eβ0+β1x)

So if x=0, then the model becomes:

log(p/(1−p))=β0

and if x=1, then the model becomes:

log(p/(1−p))=β0+β1

To obtain a confidence interval for p when x=0, I did this:

`model <- glm(y~., family=binomial(), data) #For x=0 Bigma = vcov(model) sig = sqrt(Bigma[1,1]) logit_p = coef(model)[1][[1]] #The intercept theta_L = logit_p - 1.96*sig theta_U = logit_p + 1.96*sig p_L = plogis(theta_L) p_U = plogis(theta_U)`

The confidence interval is (p_L, p_U).

Then a confidence interval for p when x=1:

`sig_x1 = sqrt(Bigma[1,1] + Bigma[2,2] + 2*Bigma[1,2]) logit_p_x1 = coef(model)[1][[1]] + coef(model)[2][[1]] #beta_0 + beta_1 theta_L_x1 = logit_p_x1 - 1.96*sig_x1 theta_U_x1 = logit_p_x1 + 1.96*sig_x1 p_L_x1 = plogis(theta_L_x1) p_U_x1 = plogis(theta_U_x1)`

The confidence interval is (p_L_x1, p_U_x1).

Now I would like a confidence interval for the difference in probability of success when x=0 and x=1.

I can obtain the point estimate of the difference:

px1−px0=[eβ0+β1−eβ0]/[(1+eβ0+β1)(1+eβ0)]

I know the next step is to compute the standard error of the difference, but I don’t know how to do this.

Question: What is the formula and R code for a confidence interval for the difference in the two probabilities when x=0 and when x=1?

**Answer**

The delta method states

Var(g(X))=[g′(X)]2Var(X)

Because this problem involves two parameters, we can extend this to the multivariate delta method

=∇gTΣ∇g

Here,

g=[eβ0+β1−eβ0]/[(1+eβ0+β1)(1+eβ0)]

and Σ is the variance covariance matrix from your model. ∇g is…gross. I’m not going to do that by hand, and computer algebra while fast yields a mess of symbols. You can however use autodifferentiation compute the gradient. Once you calculate the variance, then its simply your estimate of the difference in probs plus/minus 1.96 times the standard deviation (root of the variance). Caution, this approach will yield answers below 0 or above 1.

We can do this in R in the following way (note you need to install the `autodiffr`

package).

```
library(autodiffr)
g = function(b) (exp(b[1] + b[2]) - exp(b[1])) / ((1+ exp(b[1] + b[2]))*(1+exp(b[1])))
x = rbinom(100, 1, 0.5)
eta = -0.8 + 0.2*x
p = plogis(eta)
y = rbinom(100, 1, p)
model = glm(y~x, family=binomial())
Bigma = vcov(model)
grad_g = makeGradFunc(g)
nabla_g = grad_g(coef(model))
se = as.numeric(sqrt(nabla_g %*% Bigma %*% nabla_g))
estimate = diff(predict(model, newdata=list(x=c(0, 1)), type='response'))
estimate + c(-1, 1)*1.96*se
```

Repeating this procedure for this modest example shows that the resulting confidence interval has near nominal coverage, which is a good thing, but I imagine things would become worse as the probabilities approach 0 or 1.

**Attribution***Source : Link , Question Author : StatsSorceress , Answer Author : Demetri Pananos*