Hello helpful list,
I am using R-2.12.1 for Windows (32/64 bit); lme4_0.999375-37
I am attempting to run a generalized mixed model with proportion data using
lme4.
My prediction is simple: Exclosure trees (from which insectivores were
excluded) have higher % leaf area chewed than control trees (to which
insectivores had access).
The design is a randomized block, repeated measures design in which I have
56 trees, grouped in 28 pairs. The pairs are blocks and trees within pairs
were matched according to height, overall form, distance to water etc.. The
trees in pairs were assigned randomly as one of two treatments: an exclosure
tree in which monofilament netting was placed around the tree in order to
exclude vertebrate predators (birds and bats), and control tree. Thus,
there are 28 exclosure trees, and 28 control trees. Among other measurements
which do not pertain to this question, I measured herbivory on all trees by
randomly selecting 50 leaves from each tree (49 for one tree, n=2799). For
each of 50 leaves, I counted the total number of dots (with a plastic grid)
that encompassed the outline of the leaf (total) and the number of dots that
encompassed the area chewed by herbivores (chew). The response variable in a
glm without repeated measures would be mean % leaf area removed
(chew/total).
My first step was to try a repeated measures ANOVA with %leaf area chewed as
the response variable, and a glm with collapsed data (response = mean of
repeated measures per tree). Residuals (with data either not-transformed or
arcsin(sqrt) transformed) are decidedly not normal with either approach and
residuals increase with increasing fitted values. There are many 0’s.
I would like to maintain the repeated measures structure (instead of
collapsing all data into a mean % leaf area chewed per tree), and so have
ventured into lmer.
My leafchew dataset:
'data.frame': 2799 obs. of 6 variables:
$ pair : Factor w/ 28 levels "LEEV01","LEEV03",..: 1 1 1 1 1 1 1 1 1 1
...
$ treatment: Factor w/ 2 levels "CONT","EXCL": 1 1 1 1 1 1 1 1 1 1 ...
$ treeid : Factor w/ 56 levels "LEEV01CONT","LEEV01EXCL",..: 1 1 1 1 1 1
1 1 1 1 ...
$ leafid : Factor w/ 2799 levels "LEEV01CONT01",..: 1 2 3 4 5 6 7 8 9 10
...
$ newchew : num 0 0 0 0 0 0 0 2.5 2 0 ...
$ newtotal : num 18.5 6.5 13.5 14 6 4 18 40 11 14.5 ...
I am following an example provided on page 590 of Michael Crawley’s (The R
Book), though I realize that lmer has been updated since the book was
written so features like quasibinomial, estimated scale, etc. are no longer
available or considered “good” approaches.
I created a two-column response matrix:
##use ceiling because I counted borderline dots as 0.5 and integers are
required
chewdots<-ceiling(leafchew$newchew
##chewed
leafdots<-ceiling(leafchew$newtotal-damage$newchew)
##not chewed
y<-cbind(chewdots,leafdots)
And suggested the following model:
mod1<-glmer(y~pair + treatment + (1|treeid), binomial(link = logit), data =
damage)
Question 1:
I’ve read much of Piniero and Bates, Bates' lme4 draft book, Zuur et al.
2009, Crawly, Bolker et al. mixed model TREE paper, and have perused this
list for months. I’ve gone back and forth on whether to include “pair” as
random or fixed effect. Since pair is a nuisance factor, is not of interest
in and of itself, and since I intend to describe patterns among black
cottonwood saplings in general and not among only the 28 pairs, there is
good argument for the random effect being: (1|pair/treeid). Using lme, I
can also build heteroskedastic models in which the (high) variability among
pairs is modeled as a random effect using varIdent. But that’s another story
since I cannot do so with lmer and since I cannot use the binomial family
with lme. On the other hand, there seems good reason to use pair as a
“traditional” blocking variable (i.e., test for variability between
treatments after controlling for variability among pairs). In the end, local
advice suggested keeping pair as a fixed effect blocking factor so that’s
what I’ve done. (Also, I find that with some of my other response variables
with fewer repeated or no repeated measures, 1|pair/treeid results in 0.0
variance for pair, or the model fails to converge altogether).
Does this fixed and random effects structure seem reasonable?
Question 2:
The output for the above glmer model is at the end of this post:
How do I interpret the Std. Dev. Values for random effects and how do I
determine overdispersion?
I am suspicious of the “significant” effect of treatment (and I’m well aware
of the ongoing argument about p-values in this forum). I am suspicious
because for one, patterns uncovered with simple data summaries demonstrate
some tendency toward differences between treatments but with high variance,
so p=0.008 for treatment effect in this model (see below) seems to greatly
overstate the case. I also question weather 1% difference is biologically
meaningful:
> tapply(meandam$per.chew, meandam$treatment, mean) ## mean of mean % leaf
chew per tree
CONT EXCL
0.03508112 0.04760478
> tapply(meandam$per.chew, meandam$treatment, sd)
CONT EXCL
0.03691398 0.03174919
Also, though not good models, repeated measures ANOVA and lme models for
this same data and same Fixed/Random structure (but with %chew as response)
resulted in p-values for treatment > 0.14 - >0.4. and from these I concluded
in general that there is little evidence for a treatment effect and that
though these models were based on suckily distributed residuals, p-values
>0.4 in no way hover around the magic 0.05 mark and so it’s highly likely
that there was no treatment effect.
Question 3:
How might I validate this glmer model? Are there correct classification
methods similar to simple logistic regression?
Question 4:
The estimate for treatmentEXCL is 0.328019. I take this to mean that
treatmentCONT at LEEV01= -3.535588 is the baseline estimate, so the effect
size for treatment is 0.328019, where the *value* for treatmentEXCL at
LEEV01 algebraically determined by:
0.328 = EXCLLEEV01 - (-3.535588)
0.328 - 3.535588 = TREATLEEVO1
-3.207588 = EXCLLEEV01
and EXCLLEEV01 > CONTLEEV01
If I were to present an effect size and associated CIs, how do I calculate
the effect size for Treatment regardless of pair (instead of Treatment at
LEEV01)? Or, is this the equivalent to the effect size of treatment when
PAIR is held constant?
Thank you very much for your time.
Sacha Heath, MSc candidate
Wildlife Habitat Ecology Lab
Humboldt State University
The output for the above glmer model is:
Generalized linear mixed model fit by the Laplace approximation
Formula: y ~ pair + treatment + (1 | treeid)
Data: leafchew
AIC BIC logLik deviance
16512 16690 -8226 16452
Random effects:
Groups Name Variance Std.Dev.
treeid (Intercept) 0.19212 0.43831
Number of obs: 2799, groups: treeid, 56
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.535588 0.339854 -10.403 < 2e-16 ***
pairLEEV03 0.077567 0.467269 0.166 0.86816
pairLEEV04 0.643567 0.459567 1.400 0.16140
pairLEEV05 0.016434 0.464618 0.035 0.97178
pairLEEV06 0.808519 0.460579 1.755 0.07918 .
pairLEEV07 -0.025698 0.469756 -0.055 0.95637
pairLEEV08 0.025163 0.466954 0.054 0.95703
pairLEEV09 -0.096518 0.466500 -0.207 0.83609
pairLEEV11 -1.391627 0.483611 -2.878 0.00401 **
pairLEEV13 0.219702 0.462106 0.475 0.63448
pairLEEV14 1.372586 0.458031 2.997 0.00273 **
pairMILL01 -1.062080 0.513157 -2.070 0.03848 *
pairMILL02 0.297700 0.463727 0.642 0.52089
pairMILL04 -0.743668 0.475820 -1.563 0.11807
pairMILL05 -1.503033 0.499875 -3.007 0.00264 **
pairMILL06 -0.315513 0.465287 -0.678 0.49771
pairMILL07 -0.716488 0.481283 -1.489 0.13657
pairMILL08 -0.067715 0.464414 -0.146 0.88407
pairMILL09 -0.046709 0.464254 -0.101 0.91986
pairMILL10 0.392129 0.459155 0.854 0.39309
pairMILL11 0.001304 0.464828 0.003 0.99776
pairMILL12 -1.455049 0.498380 -2.920 0.00351 **
pairMILL14 -0.535257 0.474139 -1.129 0.25894
pairMILL15 0.344271 0.462005 0.745 0.45617
pairMILL16 -0.011718 0.464428 -0.025 0.97987
pairMILL17 0.263631 0.459839 0.573 0.56643
pairMILL19 0.624423 0.459385 1.359 0.17407
pairMILL20 0.670462 0.459213 1.460 0.14428
treatmentEXCL 0.328019 0.124670 2.631 0.00851 **
[[alternative HTML version deleted]]