How can I use logistic regression betas + raw data to get probabilities

I have a model fitted (from the literature). I also have the raw data for the predictive variables.

What’s the equation I should be using to get probabilities? Basically, how do I combine raw data and coefficients to get probabilities?

Answer

Here is the applied researcher’s answer (using the statistics package R).

First, let’s create some data, i.e. I am simulating data for a simple bivariate logistic regression model log(p1p)=β0+β1x:

> set.seed(3124)
> 
> ## Formula for converting logit to probabilities 
> ## Source: http://www.statgun.com/tutorials/logistic-regression.html
> logit2prop <- function(l){exp(l)/(1+exp(l))}
> 
> ## Make up some data
> y <- rbinom(100, 1, 0.2)
> x <- rbinom(100, 1, 0.5)

The predictor x is a dichotomous variable:

> x
  [1] 0 1 1 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 0 0 1 0 0 0 1 1 1 0 1 1 1 1 
 [48] 1 1 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0
 [95] 1 1 1 1 1 0

Second, estimate the intercept (β0) and the slope (β1). As you can see, the intercept is β0=0.8690 and the slope is β1=1.0769.

> ## Run the model
> summary(glm.mod <- glm(y ~ x, family = "binomial"))

[...]

    Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -0.8690     0.3304  -2.630  0.00854 **
x            -1.0769     0.5220  -2.063  0.03910 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

[...]

Third, R, like most statistical packages, can compute the fitted values, i.e. the probabilities. I will use these values as reference.

> ## Save the fitted values
> glm.fitted <- fitted(glm.mod)

Fourth, this step directly refers to your question: We have the raw data (here: x) and we have the coefficients (β0 and β1). Now, let’s compute the logits and save these fitted values in glm.rcdm:

> ## "Raw data + coefficients" method (RDCM)
## logit = -0.8690 + (-1.0769) * x
glm.rdcm <- -0.8690 + (-1.0769)*x

The final step is a comparison of the fitted values based on R’s fitted-function (glm.fitted) and my “hand-made” approach (logit2prop.glm.rdcm). My own function logit2prop (see first step) converts logits to probabilities:

> ## Compare fitted values and RDCM
> df <- data.frame(glm.fitted, logit2prop(glm.rdcm))
> df[10:25,]
> df[10:25,]
   glm.fitted logit2prop.glm.rdcm.
10  0.1250000            0.1250011
11  0.2954545            0.2954624
12  0.1250000            0.1250011
13  0.2954545            0.2954624
14  0.2954545            0.2954624
15  0.1250000            0.1250011
16  0.1250000            0.1250011
17  0.1250000            0.1250011
18  0.2954545            0.2954624
19  0.1250000            0.1250011
20  0.1250000            0.1250011
21  0.1250000            0.1250011
22  0.1250000            0.1250011
23  0.1250000            0.1250011
24  0.1250000            0.1250011
25  0.2954545            0.2954624

Attribution
Source : Link , Question Author : user333 , Answer Author : Bernd Weiss

Leave a Comment