I am very confused with how weight works in

`glm`

with`family="binomial"`

. In my understanding, the likelihood of the`glm`

with`family = "binomial"`

is specified as follows:

f(y) =

{n\choose{ny}} p^{ny} (1-p)^{n(1-y)} = \exp \left(n \left[ y \log \frac{p}{1-p} – \left(-\log (1-p)\right) \right] + \log {n \choose ny}\right)

where y is the “proportion of observed success” and n is the known number of trials.In my understanding, the probability of success p is parametrized with some linear coefficients \beta as p=p(\beta) and

`glm`

function with`family = "binomial"`

search for:

\textrm{arg}\max_{\beta} \sum_i \log f(y_i).

Then this optimization problem can be simplified as:

\textrm{arg}\max_{\beta} \sum_i \log f(y_i)=

\textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right)

\right] + \log {n_i \choose n_iy_i}\\

=

\textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right)

\right] \\

Therefore if we let n_i^*=n_ic for all i=1,…,N for some constant c, then it must also be true that:

\textrm{arg}\max_{\beta} \sum_i \log f(y_i)

=

\textrm{arg}\max_{\beta} \sum_i n^*_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right)

\right] \\

From this, I thought thatScaling of the number of trials n_i with a constant does NOT affect the maximum likelihood estimates of \beta given the proportion of success y_i.The help file of

`glm`

says:`"For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes"`

Therefore I expected that the scaling of weight would not affect the estimated \beta given the proportion of success as response. However the following two codes return different coefficient values:

`Y <- c(1,0,0,0) ## proportion of observed success w <- 1:length(Y) ## weight= the number of trials glm(Y~1,weights=w,family=binomial)`

This yields:

`Call: glm(formula = Y ~ 1, family = "binomial", weights = w) Coefficients: (Intercept) -2.197`

while if I multiply all weights by 1000, the estimated coefficients are different:

`glm(Y~1,weights=w*1000,family=binomial) Call: glm(formula = Y ~ 1, family = binomial, weights = w * 1000) Coefficients: (Intercept) -3.153e+15`

I saw many other examples like this even with some moderate scaling in weights.

What is going on here?

**Answer**

Your example is merely causing rounding error in R. Large weights don’t perform well in `glm`

. It’s true that scaling `w`

by virtually any smaller number, like 100, leads to same estimates as the unscaled `w`

.

If you want more reliable behavior with the weights arguments, try using the `svyglm`

function from the `survey`

package.

See here:

```
> svyglm(Y~1, design=svydesign(ids=~1, weights=~w, data=data.frame(w=w*1000, Y=Y)), family=binomial)
Independent Sampling design (with replacement)
svydesign(ids = ~1, weights = ~w, data = data.frame(w = w * 1000,
Y = Y))
Call: svyglm(formula = Y ~ 1, design = svydesign(ids = ~1, weights = ~w2,
data = data.frame(w2 = w * 1000, Y = Y)), family = binomial)
Coefficients:
(Intercept)
-2.197
Degrees of Freedom: 3 Total (i.e. Null); 3 Residual
Null Deviance: 2.601
Residual Deviance: 2.601 AIC: 2.843
```

**Attribution***Source : Link , Question Author : FairyOnIce , Answer Author : AdamO*