[R-sig-eco] mixed model with proportion data

Wed Mar 8 14:20:42 CET 2017

Mariano,

There is a huge and important difference between the two approaches suggested for your data.  The log ratio of proportions (i.e. the empirical logit of the Yes proportion) estimates the residual variance.  The binomial model assumes the residual variance is determined by the arbitrary (and made-up) sample size of 20 "tries" per response, in combination with the estimated mean proportions.  To see the arbitrariness, if you don't already, re-express your proportions out of 200, instead of 20, because 0/200, 10/200, ... 200/200 also give your observed responses.  The coefficient estimates will be the approximately same but their variances will not.  (If you didn't have additional random effects in the model, the coefficient estimates would be exactly the same but the variances would be 1/10's those from N=20).

If you are going to use the binomial GLM, I believe you must add overdispersion to the model.  Either as an individual random effect, or by using a quasibinomial response distribution.  Overdispersion is not necessary for the log proportion response because the residual error variance conceptually estimates that overdispersion.

Philip