[R-sig-ME] multiple nested random factors

Ben Bolker bbolker at gmail.com
Sat Feb 23 01:28:07 CET 2013


Amanda Adams <aadams26 at ...> writes:

> On 22/02/2013 9:05 AM, Ben Bolker wrote:
> > Amanda Adams <aadams26 at ...> writes:
> >
> >> I have been having a heck of a time figuring out how to estimate the
> >> proportion of variance from several random factors. I have a count data
> >> of the number of bat calls recorded at 3 sites, on 6 detectors, over 12
> >> nights. Detectors were at 2 heights.
> >> If I understand nested factors correctly, Detectors are nested in Site
> >> and Night is nested in Site.
> >> Site/Detector and Site/Night are random
> >> factors and Height is a fixed factor.

> >    It's still not entirely clear to me from this description how
> > your data are structured.  You have an average of about 249/12 ~ 21
> > observations per night, so I'm going to assume you have 6 detectors
> > *at each site*.  Detector will be nested in site (because it doesn't
> > make any sense to analyze what happens at "detector number 1" unless
> > the detectors are somehow arranged so that the set of (d1:site1,
> > d1:site2, d1:site3, ... has something in common).  You *may* want
> > a night:site interaction (if you have enough data), but in principle
> > you also want a site factor (probably fixed, since there are only
> > three levels) and a night factor.  This would be
> >
> >    ~ height + f.Site + (1|f.Night/f.Site) + (1|f.Site:f.Detector)
> >
> >    It is quite likely that you will find some of these variance
> > components estimated as zero ...
> >    
> Yes, I have 6 detectors at each site.

  OK

> >
> >> Also, data is overdispersed so I am transforming number of calls as
> >> log(Calls+1).
> >    This makes no sense (sorry).  Poisson models must have a response
> > variable that is a raw count value (integer).  How do you know the
> > data are overdispersed before you fit a model ???  (Although I do see
> > that you have widely varying values in your 'Calls' variable, so
> > you may be right ...)
> >
> >    For various ways of handling overdispersion in GLMMs see
> > http://glmm.wikidot.com/faq
> I had tested for overdispersion with qcc.overdispersion.test in qcc 
> package.  I had tried using an individual-level random effect to capture 
> overdispersion, but was not sure how to interpret the data once that was 
> included.

  Testing the _marginal_ distribution of the data for anything
(normality, overdispersion, etc.) is very rarely a sensible thing
to do.  You need to test for overdispersion in the _residuals_
of your fit.  It's likely though that you do need an individual-level
random effect.  Have you read the references in http://glmm.wikidot.com/faq
that discuss individual-level random effects?

> >    I don't know if it's helpful, but Bolker et al. 2009 _Trends
> > in Ecology and Evolution_ might be a citeable source for GLMMs.
> > It doesn't really say anything specific about Poisson variables
> > and why a Poisson model doesn't include a residual variance; for
> > that you should probably cite (after reading!) a basic book
> > on generalized linear models.
> This paper has been very helpful and was the reason I was initially 
> using glmer. Thanks! I will do some more reading.
> >

 [snip snip snip]

> I applied the individual-level random effect, but how do I interpret the 
> proportion of variation from each factor once it is included?
> 
>  > model <- glmer(Calls ~ f.Height + f.Site + (1|f.Site/f.Night) +
> + (1|f.Site:f.Detector), data = data, family=poisson)
>  >
>  > data$ID <- 1:nrow(data)
>  > model1 <- glmer(Calls ~ f.Height + f.Site + (1|f.Night/f.Site) + 
> (1|f.Site:f.Detector)
> + + (1|ID), data = data, family = poisson)
> Number of levels of a grouping factor for the random effects
> is *equal* to n, the number of observations
>  >
>  > anova(model, model1)
> Data: data
> Models:
> model: Calls ~ f.Height + f.Site + (1 | f.Site/f.Night) + (1 | 
> f.Site:f.Detector)
> model1: Calls ~ f.Height + f.Site + (1 | f.Night/f.Site) + (1 | 
> f.Site:f.Detector) +
> model1:     (1 | ID)
>         Df   AIC   BIC   logLik Chisq Chi Df Pr(>Chisq)
> model   8 49163 49191 -24573.4
> model1  9  1615  1647   -798.6 47550      1  < 2.2e-16 ***
> 
>  > model1
> Generalized linear mixed model fit by the Laplace approximation
> Formula: Calls ~ f.Height + f.Site + (1 | f.Night/f.Site) + 
> (1|f.Site:f.Detector) + (1 | ID)
>     Data: data
>    AIC  BIC logLik deviance
>   1615 1647 -798.6     1597
> Random effects:
>   Groups            Name        Variance Std.Dev.
>   ID                (Intercept) 1.07827  1.03840
>   f.Site:f.Night    (Intercept) 1.90958  1.38187
>   f.Site:f.Detector (Intercept) 2.32948  1.52626
>   f.Night           (Intercept) 0.65313  0.80817
> Number of obs: 249, groups: ID, 249; f.Site:f.Night, 47; 
> f.Site:f.Detector, 24; f.Night, 12
> 
> Fixed effects:
>              Estimate Std. Error z value Pr(>|z|)
> (Intercept)  2.59535    0.86051   3.016 0.002561 **
> f.Height2   -0.05362    0.64015  -0.084 0.933245
> f.Site2      1.01975    1.07455   0.949 0.342619
> f.Site3      0.73546    1.08115   0.680 0.496343
> f.Site4      4.15381    1.07196   3.875 0.000107 ***
> 
> Does this mean: Site has a significant effect on bat activity and
> 44% of the variation in bat activity levels can be explained by detector 
> placement within sites
> 36% by an interaction between Site and Night
> 12% by temporal effects (night)
> 20% by individual variation
> Does the individual variation essentially mean the variation from not 
> explained by temporal and spatial effects?

  Site 4 is significantly different from site 1 (and probably
different from the other sites as well, although that isn't
explicitly tested here).

  It's somewhat harder to do "variance decomposition" in a
GLMM (or a complex/modern LMM) than in classic models.
The 'variance components' would include the four components
listed above as well as the Poisson variance term.  Depending
on how you were thinking about it you might also include the
differences among sites and the difference in height as
'variance components'.  If you look at 'variance' narrowly
enough, then you _could_ state things the way you have above.
I don't know that much about variance partitioning; in GLMMs
it may be a bit of a research topic ...

  It may have come up on this list before, but I can't put
my finger on a thread right now.  Perhaps someone else
can.

Goldstein H, Browne W, Rasbash J (2002) Partitioning Variation in
Multilevel Models.  Understanding Statistics 1: 223--231.

Browne WJ, Subramanian SV, Jones K (2005) Variance partitioning in
multilevel logistic models that exhibit overdispersion. Journal Royal
Statistical Society. Series A 168: 599--613.



More information about the R-sig-mixed-models mailing list