[R-sig-ME] Nested error term and unbalanced design

Mon Feb 25 16:27:13 CET 2013

Baldwin, Jim -FS <jbaldwin at ...> writes:

>  While there is a definite order to family, genus, and species (no
> pun intended), I think that the "nestedness" (if any) would be
> related to how you selected your sampling units rather than the
> fixed effects of family, genus, and species.  (I admit bias in
> rarely if ever considering species as a random effect.)

> Jim

  I think I respectfully disagree ... see below ...

> I am trying to run a model that incorporates both environmental
> variables and taxonomic relationships, and I am unsure if I am 1)
> specifying the error term correctly, and 2) accounting for
> unbalanced data correctly. I would appreciate any guidance you can
> provide.

> As a simplified example, I want to ask if a bird is more likely to
> be carrying ticks based on the habitat it was caught in, and what
> kind of bird it is (my actual model has many more environmental
> variables). We have many related species in multiple genera in
> multiple families, but all in the same order. Species is nested
> within genus, and genus is nested within family. I want to estimate
> a fixed effect for both habitat and species, while accounting for
> the nestedness of the relationships of the birds, and I also want to
> account for the fact that we caught more of certain species than
> others.

> My simplified model looks like this:
> 
> M1 <- lmer(y ~ HABITAT + SPECIES + (1|FAMILY/GENUS/SPECIES),
> family=binomial(link="logit"))
> 
> where y is a column vector of (tick presence, tick absence)
> 
> So my questions are: is this the correct "grammar" for the nested error?
> and does the nested error structure by itself take into account the
>  unbalanced data structure?

   Generally you don't have to worry about lack of balance in
'modern' mixed models unless it's really extreme.

  I'm having a little bit of a hard time conceptually with the
idea of having species as a fixed effect _and_ having the 
variances of family and genus be random.  You certainly
shouldn't have a categorical predictor (SPECIES) appear as both 
a random and a fixed effect, though.

M1 <- lmer(y ~ HABITAT + SPECIES + (1|FAMILY/GENUS),
     family=binomial(link="logit"))

*might* work (I would give it a try and see if the results are sensible).
I would also consider

M1 <- lmer(y ~ HABITAT + (HABITAT|FAMILY/GENUS/SPECIES),
     family=binomial(link="logit"))

if your data set is big enough to support it.  This allows for habitat
to have different effects on different species ... (see a paper
by Schielzeth and Forstmeier on the importance of including interactions
between fixed and random effects:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2657178/ )