I am trying to use
lme4::glmer()
to fit a binomial generalized mixed model (GLMM) with dependent variable that is not binary, but a continuous variable between zero and one. One can think of this variable as a probability; in fact it is probability as reported by human subjects (in an experiment that I help analyzing). I.e. it’s not a “discrete” fraction, but a continuous variable.My
glmer()
call doesn’t work as expected (see below). Why? What can I do?Later edit: my answer below is more general than the original version of this question, so I modified the question to be more general too.
More details
Apparently it is possible to use logistic regression not only for binary DV but also for continuous DV between zero and one. Indeed, when I run
glm(reportedProbability ~ a + b + c, myData, family="binomial")
I get a warning message
Warning message: In eval(expr, envir, enclos) : noninteger #successes in a binomial glm!
but a very reasonable fit (all factors are categorical, so I can easily check whether model predictions are close to the acrosssubjectsmeans, and they are).
However, what I actually want to use is
glmer(reportedProbability ~ a + b + c + (1  subject), myData, family="binomial")
It gives me the identical warning, returns a model, but this model is clearly very much off; the estimates of the fixed effects are very far from the
glm()
ones and from the acrosssubjectmeans. (And I need to includeglmerControl(optimizer="bobyqa")
into theglmer
call, otherwise it does not converge at all.)
Answer
It makes sense to start with a simpler case of no random effects.
There are four ways to deal with continuous zerotoone response variable that behaves like a fraction or a probability (this is our most canonical/upvoted/viewed thread on this topic, but unfortunately not all four options are discussed there):

If it is a fraction $p=m/n$ of two integers and all $n$s are known, then one can use standard logistic regression, aka binomial GLM. One way to code it in R is (assuming that
n
is a vector of $N$ values for each data point):glm(p ~ a+b+c, myData, family="binomial", weights=n)

If $p$ is not a fraction of two integers, then one can use beta regression. This will only work if the observed $p$ is never equal to $0$ or $1$. If it is, then more complicated zero/oneinflated beta models are possible, but this becomes more involved (see this thread).
betareg(p ~ a+b+c, myData)

Logit transform the response and use linear regression. This is usually not advised.
lm(log(p/(1p)) ~ a+b+c, myData)

Fit a binomial model but then compute standard errors taking overdispersion into account. The standard errors can be computed in various ways:

(a) scaled standard errors via the overdispersion estimate (one, two). This is called “quasibinomial” GLM.

(b) robust standard errors via the sandwich estimator (one, two, three, four). This is called “fractional logit” in econometrics.
The (a) and (b) are not identical (see this comment, and sections 3.4.1 and 3.4.2 in this book, and this SO post and also this one and this one), but tend to give similar results. Option (a) is implemented in
glm
as follows:glm(p ~ a+b+c, myData, family="quasibinomial")

The same four ways are available with random effects.

Using
weights
argument (one, two):glmer(p ~ a+b+c + (1subject), myData, family="binomial", weights=n)
According to the second link above, it might be a good idea to model overdispersion, see there (and also #4 below).

Using beta mixed model:
glmmadmb(p ~ a+b+c + (1subject), myData, family="beta")
or
glmmTMB(p ~ a+b+c + (1subject), myData, family=list(family="beta",link="logit"))
If there are exact zeros or ones in the response data, then one can use zero/oneinflated beta model in
glmmTMB
. 
Using logit transform of the response:
lmer(log(p/(1p)) ~ a+b+c + (1subject), myData)

Accounting for overdispersion in the binomial model. This uses a different trick: adding a random effect for each data point:
myData$rowid = as.factor(1:nrow(myData)) glmer(p ~ a+b+c + (1subject) + (1rowid), myData, family="binomial", glmerControl(optimizer="bobyqa"))
For some reason this does not work properly as
glmer()
complains about nonintegerp
and yields nonsense estimates. A solution that I came up with is to use fake constantweights=k
and make sure thatp*k
is always integer. This requires roundingp
but by selectingk
that is large enough it should not matter much. The results do not seem to depend on the value ofk
.k = 100 glmer(round(p*k)/k ~ a+b+c + (1subject) + (1rowid), myData, family="binomial", weights=rowid*0+k, glmerControl(optimizer="bobyqa"))
Later update (Jan 2018): This might be an invalid approach. See discussion here. I have to investigate this more.
In my specific case option #1 is not available.
Option #2 is very slow and has issues with converging: Update: I tried glmmadmb
takes fiveten minutes to run (and still complains that it did not converge!), whereas lmer
works in a splitsecond and glmer
takes a couple of seconds.glmmTMB
as suggested in the comments by @BenBolker and it works almost as fast as glmer
, without any convergence issues. So this is what I will be using.
Options #3 and #4 yield very similar estimates and very similar Wald confidence intervals (obtained with confint
). I am not a big fan of #3 though because it is kind of cheating. And #4 feels somewhat hacky.
Huge thanks to @Aaron who pointed me towards #3 and #4 in his comment.
Attribution
Source : Link , Question Author : amoeba , Answer Author : amoeba