[R] GLMM random effects

Thu Aug 19 15:35:06 CEST 2010

> Natasha <natasha.lloyd <at> gmail.com> writes:

  For future reference, you may have better luck *either* on
the r-sig-mixed-models help list (for advice about GLMMs) *or*
on r-sig-ecology (for advice about ecological studies) [it's not
considered polite to cross-post: try looking at the archives of
both lists to decide which is more appropriate].

> Background information:
> Our study design:   We used mark-recapture trapping to obtain density
> estimates and demography of prairie dogs. The location of our study sites
> are nested, we have 6 trapping plots within 3 colonies of prairie dogs.
> Within each colony 3 plots are treatment (given supplemental food) and 3 are
> control sites. Therefore, in total we have 18 study plots (9 control and 9
> treatment).
> Our study spans over two years and within each year we trapped prairie dogs
> twice (Spring 2008, Summer 2008, Spring 2009, Summer 2009) for a total of
> four trapping sessions. Therefore we also have temporal repeated measures in
> our sampling design. From the mark-recapture data we estimated density using
> Program DENSITY for each study plot and for each session (a total of 4
> density estimates for each 18 plots over time). We also did vegetation
> sampling to obtain estimates of the natural food available in each plot,
> which gave us total biomass and % edible vegetation as covariates.
> 
> We are using the lme4 package in program R (2.11.1) for analysis:
> We fitted a GLMM with Laplace restricted maximum likelihood estimation.
> Since our response variable density is a count with overdispersion we used a
> corrected poisson distribution with a log link function. 

  (I'm not sure "corrected" is a technically meaningful term:
do Zuur et al use this?)

> And in order to
> account for the nested and repeated measures in our design we want to
> include Plot nested in Colony as random effects. The explanatory variables
> include Treatment, Session, Total biomass, and/or %Edible food for the fixed
> effects. (Colony, Plot, Treatment, and Session are all factor variables).
> 
> And an example of the R code that we started with is:
> 
> PDmodel_1 <- lmer(Density~1+Treatment+Session+Biomass+(1|Colony/Plot),
> family=quasipoisson, data=DensityPD)

  (You don't need the "1" in this formula: the intercept is implicitly
included.)

> My main question is: 
> Does this accurately take into account our nested and repeated measures
> design? Have we accounted for any possible pseudoreplication with this type
> of analysis?  Do I need to also look at type AR-1 error structures (or can I
> do that in a glmm?) Or is there a simpler way you might suggest?  I can also
> look at the change in density between time intervals instead density at time
> intervals if that makes anything easier to analyze?  I can also split
> Session into season and year if that is helpful too.

  This looks reasonable.  You're likely to have trouble estimating
the among-colony variance, because there are only 3 colonies (try
estimating a variance from 3 samples sometime) -- I wouldn't be
surprised if the estimate of the among-colony variance is zero, unless
the colonies are very different.  You might alternatively try 
Colony as a fixed effect and Colony:Plot as the random effect
(or equivalently make sure your plots have 18 distinct labels rather
than being numbered 1..6 in each plot).

   It's a bit tricky to incorporate autoregressive structures in a GLMM --
if your densities are high (means > 5) you can try this in MASS::glmmPQL.
However, I suspect you're unlikely to be able to resolve AR structure
in this data set -- if you're worried, take the residuals and examine
the correlation between r(t) and r(t+1) [you'll have 54 points -- one for
each plot/session combination, not counting the final session]. If that
correlation is small/non-significant, you're off the hook.

   Quasipoisson fits are a little bit problematic in lme4 -- two alternatives
are (1) fitting a lognormal-Poisson model by adding a random effect with
a distinct level for each observation (2) fitting a negative binomial
model with glmm.admb (glmm.admb only allows a single random effect,
but you could do that if you make Colony fixed as suggested above).

  Also keep in mind the rule of thumb that you need at least 10-20 observations
per parameter you want to estimate -- with 72 observations, you're pushing
it a little (intercept (1) + treatment (1) + session (3) + biomass (1) +
random effects (2)) (it's hard to know exactly how to count random effects,
but this is a crude rule of thumb).  I certainly wouldn't try to make
your model any *more* complicated ...

  good luck