R: glm function with family = “binomial” and “weight” specification

I am very confused with how weight works in glm with family="binomial". In my understanding, the likelihood of the glm with family = "binomial" is specified as follows:

f(y) =
{n\choose{ny}} p^{ny} (1-p)^{n(1-y)} = \exp \left(n \left[ y \log \frac{p}{1-p} – \left(-\log (1-p)\right) \right] + \log {n \choose ny}\right)

where y is the “proportion of observed success” and n is the known number of trials.

In my understanding, the probability of success p is parametrized with some linear coefficients \beta as p=p(\beta) and glm function with family = "binomial" search for:

\textrm{arg}\max_{\beta} \sum_i \log f(y_i).

Then this optimization problem can be simplified as:

\textrm{arg}\max_{\beta} \sum_i \log f(y_i)=
\textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right)
\right] + \log {n_i \choose n_iy_i}\\
\textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right)
\right] \\

Therefore if we let n_i^*=n_ic for all i=1,…,N for some constant c, then it must also be true that:

\textrm{arg}\max_{\beta} \sum_i \log f(y_i)
\textrm{arg}\max_{\beta} \sum_i n^*_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} – \left(-\log (1-p(\beta))\right)
\right] \\

From this, I thought that Scaling of the number of trials n_i with a constant does NOT affect the maximum likelihood estimates of \beta given the proportion of success y_i.

The help file of glm says:

 "For a binomial GLM prior weights are used to  
  give the number of trials when the response is 
  the proportion of successes" 

Therefore I expected that the scaling of weight would not affect the estimated \beta given the proportion of success as response. However the following two codes return different coefficient values:

 Y <- c(1,0,0,0) ## proportion of observed success
 w <- 1:length(Y) ## weight= the number of trials

This yields:

 Call:  glm(formula = Y ~ 1, family =  
            "binomial", weights = w)


while if I multiply all weights by 1000, the estimated coefficients are different:


 Call:  glm(formula = Y ~ 1, family = binomial,  
            weights = w * 1000)


I saw many other examples like this even with some moderate scaling in weights.
What is going on here?


Your example is merely causing rounding error in R. Large weights don’t perform well in glm. It’s true that scaling w by virtually any smaller number, like 100, leads to same estimates as the unscaled w.

If you want more reliable behavior with the weights arguments, try using the svyglm function from the survey package.

See here:

    > svyglm(Y~1, design=svydesign(ids=~1, weights=~w, data=data.frame(w=w*1000, Y=Y)), family=binomial)
Independent Sampling design (with replacement)
svydesign(ids = ~1, weights = ~w, data = data.frame(w = w * 1000, 
    Y = Y))

Call:  svyglm(formula = Y ~ 1, design = svydesign(ids = ~1, weights = ~w2, 
    data = data.frame(w2 = w * 1000, Y = Y)), family = binomial)


Degrees of Freedom: 3 Total (i.e. Null);  3 Residual
Null Deviance:      2.601 
Residual Deviance: 2.601    AIC: 2.843

Source : Link , Question Author : FairyOnIce , Answer Author : AdamO

Leave a Comment