[R-sig-ME] Nested error term and unbalanced design
Ben Bolker
bbolker at gmail.com
Mon Feb 25 16:27:13 CET 2013
Baldwin, Jim -FS <jbaldwin at ...> writes:
> While there is a definite order to family, genus, and species (no
> pun intended), I think that the "nestedness" (if any) would be
> related to how you selected your sampling units rather than the
> fixed effects of family, genus, and species. (I admit bias in
> rarely if ever considering species as a random effect.)
> Jim
I think I respectfully disagree ... see below ...
> I am trying to run a model that incorporates both environmental
> variables and taxonomic relationships, and I am unsure if I am 1)
> specifying the error term correctly, and 2) accounting for
> unbalanced data correctly. I would appreciate any guidance you can
> provide.
> As a simplified example, I want to ask if a bird is more likely to
> be carrying ticks based on the habitat it was caught in, and what
> kind of bird it is (my actual model has many more environmental
> variables). We have many related species in multiple genera in
> multiple families, but all in the same order. Species is nested
> within genus, and genus is nested within family. I want to estimate
> a fixed effect for both habitat and species, while accounting for
> the nestedness of the relationships of the birds, and I also want to
> account for the fact that we caught more of certain species than
> others.
> My simplified model looks like this:
>
> M1 <- lmer(y ~ HABITAT + SPECIES + (1|FAMILY/GENUS/SPECIES),
> family=binomial(link="logit"))
>
> where y is a column vector of (tick presence, tick absence)
>
> So my questions are: is this the correct "grammar" for the nested error?
> and does the nested error structure by itself take into account the
> unbalanced data structure?
Generally you don't have to worry about lack of balance in
'modern' mixed models unless it's really extreme.
I'm having a little bit of a hard time conceptually with the
idea of having species as a fixed effect _and_ having the
variances of family and genus be random. You certainly
shouldn't have a categorical predictor (SPECIES) appear as both
a random and a fixed effect, though.
M1 <- lmer(y ~ HABITAT + SPECIES + (1|FAMILY/GENUS),
family=binomial(link="logit"))
*might* work (I would give it a try and see if the results are sensible).
I would also consider
M1 <- lmer(y ~ HABITAT + (HABITAT|FAMILY/GENUS/SPECIES),
family=binomial(link="logit"))
if your data set is big enough to support it. This allows for habitat
to have different effects on different species ... (see a paper
by Schielzeth and Forstmeier on the importance of including interactions
between fixed and random effects:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2657178/ )
More information about the R-sig-mixed-models
mailing list