[R-sig-ME] GLMM for proportions
Ben Bolker
bbolker @ending from gm@il@com
Wed Jun 6 16:47:50 CEST 2018
The problem even with using frames is that it's hard to believe that
the behaviour in one frame is independent of the behaviour in the next
(an assumption of the binomial response). So I agree that a binomial
approach is probably wrong.
Possibilities:
- using a quasibinomial model would take care of at least some of the
non-independence problem
- a Beta model
- transformed ratios
On 2018-06-06 10:33 AM, poulin wrote:
> Thanks Thierry for this advice. Yes I was aware of this. Actually, the
> data were obtained by analysing videos frame by frame. The video's
> resolution was such that each frame "duration" is considered to be
> 0.04s. My first advice to the biologists was to use the numbers of
> frames for both number of success and failure. They did not want this
> because they want to speak (and analyse) in term of real duration.
> Hence, using ms instead of frames is multiplying the number of attemps
> by 4.
>
> They have publish the results last year
> (https://peerj.com/articles/3227/) but someone wrote to the editor to
> tell the statistical approach was wrong and to use directly the
> proportions in the GLMM. This person did not mention that, doing this, a
> warning message was displayed.
>
> Best regards
>
> Nicolas Poulin
> Ingénieur de Recherche
> Centre de Statistique de Strasbourg (CeStatS)
> http://www.math.unistra.fr/CeStatS/
>
> Tél : 03 68 85 0189
>
> IRMA, UMR 7501
> Université de Strasbourg et CNRS
> 7 rue René-Descartes
> 67084 Strasbourg Cedex
> Le 06/06/2018 à 16:24, Thierry Onkelinx a écrit :
>> Dear Nicolas,
>>
>> The cbind(success, failure) notation is used when we aggregate (sum)
>> the number of successes and failures. The data generating process
>> behind it, are a series of trials which result in either success or
>> failure. Hence their sum will be integer.
>>
>> We need to know more about your data generating process in order to
>> give you sensible advice. Scaling the data by using different units is
>> wrong. Compare binom.test(c(1, 9)) and binom.test(c(1000, 9000)). Both
>> yield exactly the same proportion, but their confidence interval are
>> very different. Why? c(1000, 9000) is much more informative than c(1,
>> 9).
>>
>> Best regards,
>>
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>>
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>> AND FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>> thierry.onkelinx using inbo.be
>> Havenlaan 88 bus 73, 1000 Brussel
>> www.inbo.be
>>
>> ///////////////////////////////////////////////////////////////////////////////////////////
>>
>> To call in the statistician after the experiment is done may be no
>> more than asking him to perform a post-mortem examination: he may be
>> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does
>> not ensure that a reasonable answer can be extracted from a given body
>> of data. ~ John Tukey
>> ///////////////////////////////////////////////////////////////////////////////////////////
>>
>>
>>
>>
>>
>> 2018-06-06 16:13 GMT+02:00 poulin <poulin using math.unistra.fr>:
>>> Dear list,
>>>
>>> I have a question regarding GLMM's for proportion fitted with lme4.
>>>
>>> Such models are fitted using the binomial family. When I fit such
>>> models, I
>>> use, on the left side of the formula : cbind(success,failure).
>>>
>>> Problem is when, for example, data are durations (duration of success
>>> and
>>> duration of failure) that are not integer numbers if speaking in
>>> seconds.
>>> When fitting a GLM, one can use directly in the left part of the
>>> formula a
>>> variable that is the proportion of success. When trying to do this for a
>>> GLMM one will have the warning message : « In eval (family$initalize,
>>> rho):
>>> non-integer # successes in a binomial glm! »
>>> To avoid this, biologists I work sometimes with, used ms instead of s
>>> for
>>> their duration times of success and failure but then the associated
>>> tests
>>> are too powerfull...
>>> I am not able to tell if the displayed warning message is of concern
>>> or not.
>>> So my question is : do you think it is better to use ms instead of s or
>>> directly the proportion?
>>> Thanks in advance for any help that can be provided
>>> Best regards
>>>
>>> --
>>> Nicolas Poulin
>>> Ingénieur de Recherche
>>> Centre de Statistique de Strasbourg (CeStatS)
>>> http://www.math.unistra.fr/CeStatS/
>>>
>>> Tél : 03 68 85 0189
>>>
>>> IRMA, UMR 7501
>>> Université de Strasbourg et CNRS
>>> 7 rue René-Descartes
>>> 67084 Strasbourg Cedex
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list