[R-sig-eco] inference from GLMM fit using glmer

Highland Statistics Ltd highstat at highstat.com
Tue Mar 4 12:17:43 CET 2014





>
> ------------------------------
>
> Message: 2
> Date: Mon, 3 Mar 2014 16:45:58 +0000
> From: "Howe, Eric (MNR)" <eric.howe at ontario.ca>
> To: "r-sig-ecology at r-project.org" <r-sig-ecology at r-project.org>
> Subject: [R-sig-eco] inference from GLMM fit using glmer
> Message-ID:
> 	<338F9863C8568F4DA24A1CF1BB680C1C0FC326 at CTSPIGDCAPMXS42.cihs.ad.gov.on.ca>
> 	
> Content-Type: text/plain
>
> Good day R.sig.ecology members.
>
> We seem to have detected highly significant fixed effects from a GLMM fit using glmer, despite having a fairly small sample size (30 observations, 4 groups in the random factor).  We are hoping that group members can either verify that we are interpreting these results correctly or tell us where we've gone wrong.  We are working in R for Windows version 3.0.2 and using the latest version of lme4.



Eric,
With a sample of this size (30 observations, 2 covariates, a random 
effect with only 4 levels) you should seriously reconsider whether you 
want to apply a GLMM.

There is this general rule that you need about 10-15-20-ish observations 
per parameter. And the 4 levels for the random intercept is also too 
low. What you should do is simulate some data (say with 100 levels, each 
with 50 observations) and then cripple the data so that it matches the 
size of your data. And compare results. You may be in for a shock. See 
Section 4.7 in our 'Beginner's Guide to GAMM' book for an example how to 
do this (comes out in 2 weeks).

And yes...Bob is correct in mentioning overdispersion and negative 
binomial GLMM. He Bob.......since when are you suggesting to do 
transformations? That is not in line with your MEE paper..:-))

Kind regards,

Alain






>
> We sought to simultaneously test for (1) effects of annual variation in natural food availability for black bears (measured using a rank bear food index; bfi) on the number of occurrences of human-bear conflict (oc), and (2) a temporal trend (across years) in oc.  oc and bfi were measured annually in different administrative districts (dist) within the larger region of interest.  Not all districts provided data in all years.
>
> Our data look like this:
>     dist   yr lyr  bfi   oc
> 1   ban 2004   1 2.30  852
> 2  pemb 2004   1 2.70  105
> 3    ps 2004   1 1.10  969
> 4   ban 2005   2 2.05  657
> 5  pemb 2005   2 1.86  269
> 6    ps 2005   2 1.69 1119
> 7   ban 2006   3 2.57  275
> 8   mid 2006   3 2.66  271
> 9  pemb 2006   3 2.17   57
> 10   ps 2006   3 2.08  429
> 11  ban 2007   4 1.75  530
> 12  mid 2007   4 2.17  606
> 13 pemb 2007   4 2.03  189
> 14   ps 2007   4 1.23 1567
> 15  ban 2008   5 2.86  363
> 16  mid 2008   5 3.00  467
> 17 pemb 2008   5 3.13   95
> 18   ps 2008   5 2.32  770
> 19  ban 2009   6 2.19  672
> 20  mid 2009   6 1.47  653
> 21 pemb 2009   6 2.03  471
> 22   ps 2009   6 1.90 1522
> 23  ban 2010   7 2.00  409
> 24  mid 2010   7 2.10  640
> 25 pemb 2010   7 2.10  176
> 26   ps 2010   7 1.90 1218
> 27  ban 2011   8 2.50  466
> 28  mid 2011   8 2.40  462
> 29 pemb 2011   8 2.40  117
> 30   ps 2011   8 2.00  963
>
> We used a GLMM to test for effects of bfi and lyr (a dummy variable representing the # of years since the beginning of the study), with district as the only random effect.  The log-link function improved linearity.
>
> Code for fitting the model, and the summary of the model, appear below:
>
>> oc.sr.bfi.yr = glmer(oc ~ bfi + lyr + (1|dist), data = dat, family = poisson(link = log))
>> summary(oc.sr.bfi.yr)
> Generalized linear mixed model fit by maximum likelihood ['glmerMod']
>   Family: poisson ( log )
> Formula: oc ~ bfi + lyr + (1 | dist)
>     Data: dat
>
>        AIC       BIC    logLik  deviance
> 1960.3134 1965.9182 -976.1567 1952.3134
>
> Random effects:
>   Groups Name        Variance Std.Dev.
>   dist   (Intercept) 0.3006   0.5483
> Number of obs: 30, groups: dist, 4
>
> Fixed effects:
>               Estimate Std. Error z value Pr(>|z|)
> (Intercept)  7.040105   0.277215  25.396   <2e-16 ***
> bfi         -0.486053   0.019922 -24.398   <2e-16 ***
> lyr          0.035638   0.003598   9.905   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Correlation of Fixed Effects:
>      (Intr) bfi
> bfi -0.132
> lyr -0.019 -0.295
>
> The results indicate a negative effect of bfi (more conflict when natural foods are scarce, as expected) and a slight increasing trend across years.  However, we were surprised by the reported p-values for the fixed effects, given our relatively small sample size.
>
> We found the "Getting p-values for fitted models" help page for the lme4 package.
>
> We fitted parameter-reduced models for comparison:
>> oc.sr.null = glmer(oc ~ 1 + (1|dist), data = dat, family = poisson(link = log))
>> oc.sr.bfi = glmer(oc ~ bfi + (1|dist), data = dat, family = poisson(link = log))
> AIC and BIC values were greater by 100 when we excluded the effect of year, and were greater by 500 when we also excluded the effect of the bfi.  These results seem to agree with the summary of the more general model, but they also seem a bit extreme given our small sample size.
>
> Model comparisons using PBmodcomp from the pbkrtest package also seem to support the inclusion of both fixed effects.
>> mc.oc.bfi.nobfi = PBmodcomp(oc.sr.bfi, oc.sr.null, nsim = 100)
>> mc.oc.yr.noyr = PBmodcomp(oc.sr.bfi.yr, oc.sr.bfi, nsim = 100)
>> mc.oc.bfi.nobfi
> Parametric bootstrap test; time: 26.13 sec; samples: 100 extremes: 0;
> large : oc ~ bfi + (1 | dist)
> small : oc ~ 1 + (1 | dist)
>           stat df   p.value
> LRT    513.43  1 < 2.2e-16 ***
> PBtest 513.43     0.009901 **
> ---
>
>> mc.oc.yr.noyr
> Parametric bootstrap test; time: 39.83 sec; samples: 100 extremes: 0;
> large : oc ~ bfi + lyr + (1 | dist)
> small : oc ~ bfi + (1 | dist)
>           stat df   p.value
> LRT    98.323  1 < 2.2e-16 ***
> PBtest 98.323     0.009901 **
> ---
>
> We calculated bootstrap confidence intervals using "confint".  They seem to indicate that the estimated effects were unambiguous, but we were unsure whether our sample size was adequate to apply this method.
>
>> confint (oc.sr.bfi.yr, method = c("boot"))
> Computing bootstrap confidence intervals ...
>                            2.5 %      97.5 %
> sd_(Intercept)|dist  0.13871500  0.83969172
> (Intercept)          6.54440740  7.56234905
> bfi                 -0.53389761 -0.44979598
> lyr                  0.02835891  0.04325712
>
> Simple diagnostic plots (residuals vs fitted values and a normal QQ plot of residuals) of the GLMM with effects of both bfi and lyr did not reveal violations of assumptions.  We would be happy to report significant effects of food availability and a significant trend across years, but are concerned that we may not be interpreting these results correctly.  Any help or suggestions would be greatly appreciated.
>
> Sincerely,
> Eric
>
>
> 	[[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 03 Mar 2014 18:01:55 +0100
> From: "Bob O'Hara" <bohara at senckenberg.de>
> To: r-sig-ecology at r-project.org, "Howe, Eric (MNR)"
> 	<eric.howe at ontario.ca>
> Subject: Re: [R-sig-eco] inference from GLMM fit using glmer
> Message-ID: <5314B583.10908 at senckenberg.de>
> Content-Type: text/plain
>
> On 03/03/14 17:45, Howe, Eric (MNR) wrote:
>> Good day R.sig.ecology members.
>>
>> We seem to have detected highly significant fixed effects from a GLMM fit using glmer, despite having a fairly small sample size (30 observations, 4 groups in the random factor).  We are hoping that group members can either verify that we are interpreting these results correctly or tell us where we've gone wrong.  We are working in R for Windows version 3.0.2 and using the latest version of lme4.
> It looks like you're ignoring any overdispersion, so you're assuming
> that the variance equals the mean. For example, a mean of 900 has a
> standard deviation of 30. This is smaller than the range of your data.
> Because of this, the model assumes there is almost no residual error,
> which forces teh effects to have a small error.
>
> With counts this high I would start by simply square root transforming
> the data and analysing that: the variance should be fairly well
> stabilised and the residuals should be pretty much normally distributed.
>
> Alternatively, you could add overdispersion by using either a negative
> binomial distribution, or an extra random effect with a level for each
> observation (this would be equivalent to an Poisson log-normal assumption).
>
> Bob
>
>> We sought to simultaneously test for (1) effects of annual variation in natural food availability for black bears (measured using a rank bear food index; bfi) on the number of occurrences of human-bear conflict (oc), and (2) a temporal trend (across years) in oc.  oc and bfi were measured annually in different administrative districts (dist) within the larger region of interest.  Not all districts provided data in all years.
>>
>> Our data look like this:
>>      dist   yr lyr  bfi   oc
>> 1   ban 2004   1 2.30  852
>> 2  pemb 2004   1 2.70  105
>> 3    ps 2004   1 1.10  969
>> 4   ban 2005   2 2.05  657
>> 5  pemb 2005   2 1.86  269
>> 6    ps 2005   2 1.69 1119
>> 7   ban 2006   3 2.57  275
>> 8   mid 2006   3 2.66  271
>> 9  pemb 2006   3 2.17   57
>> 10   ps 2006   3 2.08  429
>> 11  ban 2007   4 1.75  530
>> 12  mid 2007   4 2.17  606
>> 13 pemb 2007   4 2.03  189
>> 14   ps 2007   4 1.23 1567
>> 15  ban 2008   5 2.86  363
>> 16  mid 2008   5 3.00  467
>> 17 pemb 2008   5 3.13   95
>> 18   ps 2008   5 2.32  770
>> 19  ban 2009   6 2.19  672
>> 20  mid 2009   6 1.47  653
>> 21 pemb 2009   6 2.03  471
>> 22   ps 2009   6 1.90 1522
>> 23  ban 2010   7 2.00  409
>> 24  mid 2010   7 2.10  640
>> 25 pemb 2010   7 2.10  176
>> 26   ps 2010   7 1.90 1218
>> 27  ban 2011   8 2.50  466
>> 28  mid 2011   8 2.40  462
>> 29 pemb 2011   8 2.40  117
>> 30   ps 2011   8 2.00  963
>>
>> We used a GLMM to test for effects of bfi and lyr (a dummy variable representing the # of years since the beginning of the study), with district as the only random effect.  The log-link function improved linearity.
>>
>> Code for fitting the model, and the summary of the model, appear below:
>>
>>> oc.sr.bfi.yr = glmer(oc ~ bfi + lyr + (1|dist), data = dat, family = poisson(link = log))
>>> summary(oc.sr.bfi.yr)
>> Generalized linear mixed model fit by maximum likelihood ['glmerMod']
>>    Family: poisson ( log )
>> Formula: oc ~ bfi + lyr + (1 | dist)
>>      Data: dat
>>
>>         AIC       BIC    logLik  deviance
>> 1960.3134 1965.9182 -976.1567 1952.3134
>>
>> Random effects:
>>    Groups Name        Variance Std.Dev.
>>    dist   (Intercept) 0.3006   0.5483
>> Number of obs: 30, groups: dist, 4
>>
>> Fixed effects:
>>                Estimate Std. Error z value Pr(>|z|)
>> (Intercept)  7.040105   0.277215  25.396   <2e-16 ***
>> bfi         -0.486053   0.019922 -24.398   <2e-16 ***
>> lyr          0.035638   0.003598   9.905   <2e-16 ***
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Correlation of Fixed Effects:
>>       (Intr) bfi
>> bfi -0.132
>> lyr -0.019 -0.295
>>
>> The results indicate a negative effect of bfi (more conflict when natural foods are scarce, as expected) and a slight increasing trend across years.  However, we were surprised by the reported p-values for the fixed effects, given our relatively small sample size.
>>
>> We found the "Getting p-values for fitted models" help page for the lme4 package.
>>
>> We fitted parameter-reduced models for comparison:
>>> oc.sr.null = glmer(oc ~ 1 + (1|dist), data = dat, family = poisson(link = log))
>>> oc.sr.bfi = glmer(oc ~ bfi + (1|dist), data = dat, family = poisson(link = log))
>> AIC and BIC values were greater by 100 when we excluded the effect of year, and were greater by 500 when we also excluded the effect of the bfi.  These results seem to agree with the summary of the more general model, but they also seem a bit extreme given our small sample size.
>>
>> Model comparisons using PBmodcomp from the pbkrtest package also seem to support the inclusion of both fixed effects.
>>> mc.oc.bfi.nobfi = PBmodcomp(oc.sr.bfi, oc.sr.null, nsim = 100)
>>> mc.oc.yr.noyr = PBmodcomp(oc.sr.bfi.yr, oc.sr.bfi, nsim = 100)
>>> mc.oc.bfi.nobfi
>> Parametric bootstrap test; time: 26.13 sec; samples: 100 extremes: 0;
>> large : oc ~ bfi + (1 | dist)
>> small : oc ~ 1 + (1 | dist)
>>            stat df   p.value
>> LRT    513.43  1 < 2.2e-16 ***
>> PBtest 513.43     0.009901 **
>> ---
>>
>>> mc.oc.yr.noyr
>> Parametric bootstrap test; time: 39.83 sec; samples: 100 extremes: 0;
>> large : oc ~ bfi + lyr + (1 | dist)
>> small : oc ~ bfi + (1 | dist)
>>            stat df   p.value
>> LRT    98.323  1 < 2.2e-16 ***
>> PBtest 98.323     0.009901 **
>> ---
>>
>> We calculated bootstrap confidence intervals using "confint".  They seem to indicate that the estimated effects were unambiguous, but we were unsure whether our sample size was adequate to apply this method.
>>
>>> confint (oc.sr.bfi.yr, method = c("boot"))
>> Computing bootstrap confidence intervals ...
>>                             2.5 %      97.5 %
>> sd_(Intercept)|dist  0.13871500  0.83969172
>> (Intercept)          6.54440740  7.56234905
>> bfi                 -0.53389761 -0.44979598
>> lyr                  0.02835891  0.04325712
>>
>> Simple diagnostic plots (residuals vs fitted values and a normal QQ plot of residuals) of the GLMM with effects of both bfi and lyr did not reveal violations of assumptions.  We would be happy to report significant effects of food availability and a significant trend across years, but are concerned that we may not be interpreting these results correctly.  Any help or suggestions would be greatly appreciated.
>>
>> Sincerely,
>> Eric
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

-- 
Dr. Alain F. Zuur

First author of:
1. Beginner's Guide to GAMM with R (2014).
2. Beginner's Guide to GLM and GLMM with R (2013).
3. Begginner's Guide to GAM with R (2012).
4. Zero Inflated Models and GLMM with R (2012).
5. A Beginner's Guide to R (2009).
6. Mixed effects models and extensions in ecology with R (2009).
7. Analysing Ecological Data (2007).

Highland Statistics Ltd.
9 St Clair Wynd
UK - AB41 6DZ Newburgh
Tel:   0044 1358 788177
Email: highstat at highstat.com
URL:   www.highstat.com



More information about the R-sig-ecology mailing list