[R-sig-ME] P-value associated to explanatory from glmer binomial family

Fri May 25 20:45:58 CEST 2018

 What Thierry is explaining is that F tests don't really work for
GLM(M)s. (F tests are based on the uncertainty of the estimate of the
residual variance; typical (binomial & Poisson) GLMs don't estimate a
residual variance at all. There is a theory of "Bartlett corrections",
which are finite-size corrections for GLMs, but they're not widely
used.)

1. pf(Fstat,df,lower.tail=FALSE)  (equivalent to your 1-pf(...)
calculation, but more accurate for small p-values) would be reasonable
*if* the "F statistic" presented by anova() were actually
F-distributed with a known df, but as Thierry said, it's not. The "2"
and "3" given by anova() are the *model* ("numerator") degrees of
freedom, not the *residual* (denominator) df, which are unknown/hard
to compute.

2. 0.058 is indeed the p-value associated with the 'type' variable,
since that's the only difference between the models

3. The df doesn't really exist here.  I *think* you can get to the
equivalent F statistic reported in the anova() from the information
given here, but I'd have to think about it for 5 minutes ... but since
you're not going to run an F test anyway, it doesn't matter too much
...

On Fri, May 25, 2018 at 4:13 AM, Mario Garrido <gaiarrido using gmail.com> wrote:
>  Dear Thierry,
> thanks so much for the clarification. After I run the LRT I get those
> results
>
> gm1 <- glmer(Mycoplasma~ type+ (1|donor.number), family=binomial)
>
> anova(gm1)
>
> Analysis of Variance Table
>
>      Df Sum Sq Mean Sq F value
>
> type  1 3.2124  3.2124  3.2124
>
> gm0<-  glmer(Mycoplasma~ 1+ (1|donor.number), family=binomial)
>
> anova(gm1, gm0)
>
> Data: NULL
>
> Models:
>
> gm0: Mycoplasma ~ 1 + (1 | donor.number)
>
> gm1: Mycoplasma ~ type + (1 | donor.number)
>
>          Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
>
> gm0  2 46.492 49.869 -21.246   42.492
> gm1  3 44.901 49.968 -19.451   38.901 3.5905      1    0.05811 .
>
>
> The F-value associated to  type, my only explanatory variable, is 3.124, as
> anova(gm1) shows above
>
> 1. So which values should I take to calculate the P-value associated to the
> variable type? 2 and 3 as shows  anova(gm1, gm0)
> Is like that then?
>
> 1-pf(3.2124,2,3)
>
> [1] 0.1795865
>
> or
>
> 1-pf(3.2124,3,2)
>
> [1] 0.246378
>
>
> 2.   anova(gm1, gm0) give a P-value associated of 0.058, since I have only
> one explanatory variable, is not this value the defining the significance
> of this variable (the ine that makes the difference between the 2 models)
>
>
> 3. What is in this case the F-value and df provided by anova(gm1)?
>
>
>
> Srry, I am a little confused with the results. Thanks!
>
>
> 2018-05-24 11:55 GMT+03:00 Thierry Onkelinx <thierry.onkelinx using inbo.be>:
>
>> Dear Mario,
>>
>> Calculating the degrees of freedom of a mixed model is not straightforward.
>>
>> A workaround would be to use a likelihoodratio test between two nested
>> models: one with and one without the variable. See the example below.
>>
>> library(lme4)
>> gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
>>               data = cbpp, family = binomial)
>> anova(gm1)
>> gm0 <- glmer(cbind(incidence, size - incidence) ~ (1 | herd),
>>               data = cbpp, family = binomial)
>> anova(gm1, gm0)
>>
>>
>> Best regards,
>>
>>
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>>
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
>> FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>> thierry.onkelinx using inbo.be
>> Havenlaan 88
>> <https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
>> 1000 Brussel
>> www.inbo.be
>>
>> ////////////////////////////////////////////////////////////
>> ///////////////////////////////
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>> ////////////////////////////////////////////////////////////
>> ///////////////////////////////
>>
>> <https://www.inbo.be>
>>
>> 2018-05-23 13:14 GMT+02:00 Mario Garrido <gaiarrido using gmail.com>:
>>
>>> Dear lme4-users,
>>> I am trying to get the P-value associated with a glmer model from the
>>> binomial family.
>>> My model is the following:
>>> glmer(Infection.status~origin+ (1|donationID), family=binomial)->q7H
>>>
>>> where Infection status is a dummy variable with two levels, infected and
>>> uninfected
>>> I tried to get the P-value associated to the the explanatory variable
>>> origin
>>> but I get only the F-value and the degrees of freedom
>>>
>>> (aov <- anova(q7H))
>>> Analysis of Variance Table
>>>          Df Sum Sq Mean Sq F value
>>> origin   2 5.3061  2.6531  2.6531
>>>
>>> I have 2 different questions
>>> 1. Am I doing correctly or am I using an incorrect command?
>>>
>>> 2. with the F-value I get and the df, should I go to test the significance
>>> to a F or Chi-squared table? I guess I should go to the latest since I am
>>> running a binomial test, right?
>>> In case I have to go to an F table, how can I know the numerator and
>>> denominator degrees of freedom?
>>>
>>> Thanks in advance
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models