[R-sig-ME] Comparing results from glmer and glht

Mon Jan 16 09:58:40 CET 2012

Dear Colin,

I don't think your Tukey tests are very informative. Because you are testing main effects how have an interaction with another variable. So you are not testing the overall effect of a variable but the effect when all other variable are 0 (continuous) or at the reference level (factor).

A better way of doing this could bet o create a new variable which is the interaction between wsh and rip and use that in your model instead of wsh * rip. Then you can use glht() with user defined contrasts so that the contrasts test what you want to test.

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Colin Wahl
Verzonden: zondag 15 januari 2012 1:23
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] Comparing results from glmer and glht

I will try to make this concise.

Background: I am testing the effects of land use and forested riparian buffers on stream invertebrates and in-stream variables. There are 4 watershed types (defined by 4 types of land use) and two riparian types (forested and non). Percent EPT (relative abundance) was my main response variable. I also measured a variety of in-stream variables like temperature, nutrients, and toxicity. There are 72 observations for invertebrates, and 24 for in-stream variables.

I am curious of how acceptable p values are from pairwise comparisons using glht() from the multcomp package

I used glmer with a binomial error structure and an observation-level random effect (to account for overdispersion), to model invertebrates:

modelEPT<-glmer(EPT ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs), data=ept, family=binomial(link="logit"))

   AIC   BIC logLik deviance
 284.4 309.5 -131.2    262.4
Random effects:
 Groups     Name        Variance Std.Dev.
 obs        (Intercept) 0.30186  0.54942
 stream:rip (Intercept) 0.40229  0.63427
 stream     (Intercept) 0.12788  0.35760
Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)   -4.2906     0.4935   -8.694  < 2e-16 ***
wshd           -2.0557     0.7601  -2.705  0.00684 **
wshf            3.3575     0.6339   5.297  1.18e-07 ***
wshg           3.3923     0.7486    4.531  5.86e-06 ***
ripN             0.1425     0.6323   0.225  0.82165
wshd:ripN     0.3708     0.9682   0.383  0.70170
wshf:ripN    -0.8665     0.8087   -1.071  0.28400
wshg:ripN    -3.1530     0.9601  -3.284  0.00102 **
---

Correlation of Fixed Effects:
                 (Intr)  wshd   wshf   wshg   ripN   wshd:N wshf:N
wshd        -0.649
wshf        -0.779  0.505
wshg        -0.659  0.428  0.513
ripN         -0.644  0.418  0.501  0.424
wshd:ripN  0.421 -0.672 -0.327 -0.277 -0.653 wshf:ripN  0.503 -0.327 -0.638 -0.332 -0.782  0.511 wshg:ripN  0.424 -0.275 -0.330 -0.632 -0.659  0.430  0.515

I then used this model to do Tukey's HSD contrasts between watershed types:

summary(glht(modelEPT, linfct=mcp(wsh="Tukey"))) Linear Hypotheses:

                Estimate Std. Error z value Pr(>|z|)
d - c == 0 -2.05573    0.76010  -2.705   0.0341 *
f - c == 0  3.35753    0.63386   5.297   <0.001 ***
g - c == 0  3.39231    0.74862   4.531   <0.001 ***
f - d == 0  5.41326    0.70176   7.714   <0.001 ***
g - d == 0  5.44804    0.80692   6.752   <0.001 ***
g - f == 0  0.03479    0.68931   0.050   1.0000

and riparian types:

                                                          Estimate Std. Error z value Pr(>|z|)
C: Forested vs. Non-Forested == 0         0.1425     0.6323   0.225  0.99999
D: Forested vs. Non-Forested == 0         0.5134     0.7332   0.700  0.98659
F: Forested vs. Non-Forested == 0        -0.7239     0.5042  -1.436  0.69625
G: Forested vs. Non-Forested == 0        -3.0105     0.7225  -4.167  < 0.001 ***

Are these p values accurate? Or is that a personal judgement I have to make based on the clarity of the patterns they reflect?

I've shown these results in my figures and explained them in my results. I've basically explained that though these p values reasonably reflect patterns in my data, effects sizes, and variances, that they are inexact and potentially anti-conservative due to the issues with degrees of freedom in mixed models.

>From what I understand from my research in the last year is that
Douglas Bates and others advocate something of a paradigm shift away from the petagogically reinforced reliance on cryptic p values toward more in depth discussions of effects sizes and variances. The use of MCMC sampling and HPD intervals are suggested, but these are not available for generalized models.

I am interested in publishing these results as an ecologist, not a statistician (pardon the somewhat artificial distinction), and, I am very interested in what kind of a discussion the statisticians and ecologists of the r-sig-mixed-models mailing list would like to see as potential reviewers.

Thank you,

Colin Wahl

M.S. candidate,
Dept. of Biology
Western Washington University
Bellingham, WA

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models