# R: glm function with family = “binomial” and “weight” specification

I am very confused with how weight works in glm with family="binomial". In my understanding, the likelihood of the glm with family = "binomial" is specified as follows:
$$f(y) = {n\choose{ny}} p^{ny} (1-p)^{n(1-y)} = \exp \left(n \left[ y \log \frac{p}{1-p} – \left(-\log (1-p)\right) \right] + \log {n \choose ny}\right) f(y) = {n\choose{ny}} p^{ny} (1-p)^{n(1-y)} = \exp \left(n \left[ y \log \frac{p}{1-p} - \left(-\log (1-p)\right) \right] + \log {n \choose ny}\right)$$
where $$yy$$ is the “proportion of observed success” and $$nn$$ is the known number of trials.

In my understanding, the probability of success $$pp$$ is parametrized with some linear coefficients $$\beta\beta$$ as $$p=p(\beta)p=p(\beta)$$ and glm function with family = "binomial" search for:
$$\textrm{arg}\max_{\beta} \sum_i \log f(y_i). \textrm{arg}\max_{\beta} \sum_i \log f(y_i).$$
Then this optimization problem can be simplified as:

$$\textrm{arg}\max_{\beta} \sum_i \log f(y_i)= \textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right) \right] + \log {n_i \choose n_iy_i}\\ = \textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right) \right] \\ \textrm{arg}\max_{\beta} \sum_i \log f(y_i)= \textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right) \right] + \log {n_i \choose n_iy_i}\\ = \textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right) \right] \\$$
Therefore if we let $$n_i^*=n_icn_i^*=n_ic$$ for all $$i=1,…,Ni=1,...,N$$ for some constant $$cc$$, then it must also be true that:
$$\textrm{arg}\max_{\beta} \sum_i \log f(y_i) = \textrm{arg}\max_{\beta} \sum_i n^*_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right) \right] \\ \textrm{arg}\max_{\beta} \sum_i \log f(y_i) = \textrm{arg}\max_{\beta} \sum_i n^*_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right) \right] \\$$
From this, I thought that Scaling of the number of trials $$n_in_i$$ with a constant does NOT affect the maximum likelihood estimates of $$\beta\beta$$ given the proportion of success $$y_iy_i$$.

The help file of glm says:

 "For a binomial GLM prior weights are used to
give the number of trials when the response is
the proportion of successes"


Therefore I expected that the scaling of weight would not affect the estimated $$\beta\beta$$ given the proportion of success as response. However the following two codes return different coefficient values:

 Y <- c(1,0,0,0) ## proportion of observed success
w <- 1:length(Y) ## weight= the number of trials
glm(Y~1,weights=w,family=binomial)


This yields:

 Call:  glm(formula = Y ~ 1, family =
"binomial", weights = w)

Coefficients:
(Intercept)
-2.197


while if I multiply all weights by 1000, the estimated coefficients are different:

 glm(Y~1,weights=w*1000,family=binomial)

Call:  glm(formula = Y ~ 1, family = binomial,
weights = w * 1000)

Coefficients:
(Intercept)
-3.153e+15


I saw many other examples like this even with some moderate scaling in weights.
What is going on here?

Your example is merely causing rounding error in R. Large weights don’t perform well in glm. It’s true that scaling w by virtually any smaller number, like 100, leads to same estimates as the unscaled w.

If you want more reliable behavior with the weights arguments, try using the svyglm function from the survey package.

See here:

    > svyglm(Y~1, design=svydesign(ids=~1, weights=~w, data=data.frame(w=w*1000, Y=Y)), family=binomial)
Independent Sampling design (with replacement)
svydesign(ids = ~1, weights = ~w, data = data.frame(w = w * 1000,
Y = Y))

Call:  svyglm(formula = Y ~ 1, design = svydesign(ids = ~1, weights = ~w2,
data = data.frame(w2 = w * 1000, Y = Y)), family = binomial)

Coefficients:
(Intercept)
-2.197

Degrees of Freedom: 3 Total (i.e. Null);  3 Residual
Null Deviance:      2.601
Residual Deviance: 2.601    AIC: 2.843