I devised this toy example

`library(sigmoid) N <- 10000 age <- runif(N, min=20, max=90) e <- rnorm(N, 0, 5) hi <- logistic(-100+2*age+e) hid <- ifelse(hi>=0.5, T, F) hid <- as.factor(hid) df <- data.frame(age=age, hid=hid) lr <- glm(hid~age, data=df, family=binomial(link="logit")) s <- summary(lr) print(s)`

The variable

`hid`

contains 4304 FALSE and 5696 TRUE.I would have expected to get the correct coefficients out of the logistic regression.

Instead I am getting -39.46 for the intercept and 0.79 for the slope. Both with p-values $\approx$ 0.

What am I doing wrong?

**Answer**

If you’re trying to generate data from logistic regression’s assumed data generating mechanism, your code does not do that.

Logistic regression’s data generating mechanism looks like

$$ \eta = X\beta$$

$$ p = \dfrac{1}{1+e^{-\eta}}$$

$$ y \sim \operatorname{Binomial}(p, n) $$

What it looks like you’re trying to do is create a linear regression in the log odds space, error term included. That is incorrect. The error term comes from the binomial likelihood. To create data properly so that `glm`

will estimate the parameters you’ve specified, you need to do

```
library(sigmoid)
N <- 10000
age <- runif(N, min=20, max=90)
#Changes here
p <- logistic(-100+2*age)
hid <- rbinom(N, 1, p)
# End changes
df <- data.frame(age=age, hid=hid)
lr <- glm(hid~age, data=df, family=binomial(link="logit"))
s <- summary(lr)
print(s)
```
```

**Attribution***Source : Link , Question Author : robertspierre , Answer Author : Demetri Pananos*