I have created a Logistic Regression using the following code:

`full.model.f = lm(Ft_45 ~ ., LOG_D) base.model.f = lm(Ft_45 ~ IP_util_E2pl_m02_flg) step(base.model.f, scope=list(upper=full.model.f, lower=~1), direction="forward", trace=FALSE)`

I have then used the output to create a final model:

`final.model.f = lm(Ft_45 ~ IP_util_E2pl_m02_flg + IP_util_E2_m02_flg + AE_NumVisit1_flg + OP_NumVisit1_m01_flg + IP_TotLoS_m02 + Ft1_45 + IP_util_E1_m05_flg + IP_TotPrNonElecLoS_m02 + IP_util_E2pl_m03_flg + LTC_coding + OP_NumVisit0105_m03_flg + OP_NumVisit11pl_m03_flg + AE_ArrAmb_m02_flg)`

Then I have predicted the outcomes for a different set of data using the predict function:

`log.pred.f.v <- predict(final.model.f, newdata=LOG_V)`

I have been able to use establish a pleasing ROC curve and created a table to establish the sensitivity and specificity which gives me responses I would expect.

However What I am trying to do is establish for each row of data what the probability is of Ft_45 being 1. If I look at the output of log.pred.f.v I get, for example,:

`1 -0.171739593 2 -0.049905948 3 0.141146419 4 0.11615669 5 0.07342591 6 0.093054334 7 0.957164383 8 0.098415639 . . . 104 0.196368229 105 1.045208447 106 1.05499112`

As I only have a tentative grasp on what I am doing I am struggling to understand how to interpret the negative and higher that 1 values as I would expect a probability to be between 0 and 1.

So my question is am I just missing a step where I need to transform the output or have I gone completely wrong.

Thank you in advance for any help you are able to offer.

**Answer**

First, it looks like you built a regular linear regression model, not a logistic regression model. To build a logistic regression model, you need to use `glm()`

with ` family="binomial" `

, not `lm()`

.

Suppose you build the following logistic regression model using independent variables x1,x2, and x3 to predict the probability of event y:

```
logit <- glm(y~x1+x2+x3,family="binomial")
```

This model has regression coefficients β0,β1,β2 and β3.

If you then do `predict(logit)`

, R will calculate and return `b0 + b1*x1 + b2*x2 + b3*x3`

.

Recall that your logistic regression equation is y=log(p1−p)=β0+β1x1+β2x2+β3x3.

So, to get the probabilities that you want, you need to solve this equation for p.

In R, you can do something like this:

```
pred <- predict(logit,newdata=data) #gives you b0 + b1x1 + b2x2 + b3x3
probs <- exp(pred)/(1+exp(pred)) #gives you probability that y=1 for each observation
```

**Attribution***Source : Link , Question Author : SeBee , Answer Author : Ben F*