[R-sig-ME] GLMM for proportions

Wed Jun 6 16:24:15 CEST 2018

Dear Nicolas,

The cbind(success, failure) notation is used when we aggregate (sum)
the number of successes and failures. The data generating process
behind it, are a series of trials which result in either success or
failure. Hence their sum will be integer.

We need to know more about your data generating process in order to
give you sensible advice. Scaling the data by using different units is
wrong. Compare binom.test(c(1, 9)) and binom.test(c(1000, 9000)). Both
yield exactly the same proportion, but their confidence interval are
very different. Why? c(1000, 9000) is much more informative than c(1,
9).

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

2018-06-06 16:13 GMT+02:00 poulin <poulin using math.unistra.fr>:
> Dear list,
>
> I have a question regarding GLMM's for proportion fitted with lme4.
>
> Such models are fitted using the binomial family. When I fit such models, I
> use, on the left side of the formula : cbind(success,failure).
>
> Problem is when, for example, data are durations (duration of success and
> duration of failure) that are not integer numbers if speaking in seconds.
> When fitting a GLM, one can use directly in the left part of the formula a
> variable that is the proportion of success. When trying to do this for a
> GLMM one will have the warning message : « In eval (family$initalize, rho):
> non-integer # successes in a binomial glm! »
> To avoid this, biologists I work sometimes with, used ms instead of s for
> their duration times of success and failure but then the associated tests
> are too powerfull...
> I am not able to tell if the displayed warning message is of concern or not.
> So my question is : do you think it is better to use ms instead of s or
> directly the proportion?
> Thanks in advance for any help that can be provided
> Best regards
>
> --
> Nicolas Poulin
> Ingénieur de Recherche
> Centre de Statistique de Strasbourg (CeStatS)
> http://www.math.unistra.fr/CeStatS/
>
> Tél : 03 68 85 0189
>
> IRMA, UMR 7501
> Université de Strasbourg et CNRS
> 7 rue René-Descartes
> 67084 Strasbourg Cedex
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models