[R-sig-ME] Comparing results from glmer and glht

Tue Jan 17 15:28:00 CET 2012

Hi Colin,

This is what I would do. The example is on a simple lm() but is similar to a mixed model.
I find it easier to define contrasts on models without the intercept.

set.seed(12345)
dataset <- expand.grid(wsh = c("C", "D", "F", "G"), rip = c("Forest", "Non-forest"), id = seq_len(10))
dataset$wshrip <- with(dataset, wsh:rip)
dataset$Y <- runif(4, min = -10, max = 10)[dataset$wsh] + runif(2, min = -2, max = 2)[dataset$rip] + runif(8, min = -1, max = 1)[dataset$wshrip] + rnorm(nrow(dataset))
model <- lm(Y ~ 0 + wshrip, data = dataset)
library(multcomp)
K <- rbind(
  c(1, 1, -1, -1, 0, 0, 0, 0),
  c(0, 0, 1, 1, 0, 0, -1, -1), 
  c(1, -1, 1, -1, 1, -1, 1, -1))
rownames(K) <- c("C - D = 0", "D - G = 0", "Forest - non-forest = 0")
summary(glht(model, K))

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: Colin Wahl [mailto:biowahl at gmail.com] 
Verzonden: maandag 16 januari 2012 21:55
Aan: ONKELINX, Thierry; r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] Comparing results from glmer and glht

Thierry,
Thank you for the suggestion. I think I understand, though creating user defined contrasts will be difficult.

I create a combined variable:
wshrip <-c bind(wsh, rip)

Resulting in a new model
modelEPT<-glmer(EPT ~ wshrip + (1|stream) + (1|stream:rip) + (1|obs), data=ept, family=binomial(link="logit"))

The user defined contrasts I tried previously were:
wsh <- rbind("C vs. D" = c(1,0,0,0,0,0,0,0),
            "C vs. F" = c(0,1,0,0,0,0,0,0),
            "C vs. G" = c(0,0,1,0,0,0,0,0),
            "D vs. F" = c(-1,1,0,0,0,0,0,0),
            "D vs. G" = c(-1,0,1,0,0,0,0,1),
            "F vs. G" = c(0,-1,1,0,0,0,0,0))

This did not give me reasonable results, prompting me to use built-in
contrasts:

summary(glht(modelEPT, linfct=mcp(wsh="Tukey")))

For riparian contrasts I used:
rip <- rbind("C: Forested vs. Non-Forested" = c(0,0,0,0,1,0,0,0),
            "D: Forested vs. Non-Forested" = c(0,0,0,0,1,1,0,0),
            "F: Forested vs. Non-Forested" = c(0,0,0,0,1,0,1,0),
            "G: Forested vs. Non-Forested" = c(0,0,0,0,1,0,0,1))

I dont understand how these contrasts are defined, but based them on contrasts on an almost identical design found here:
http://thebiobucket.blogspot.com/2011/06/glmm-with-custom-multiple-comparisons.html#more

Could you please explain how to use user-defined for the model with a wshrip combined variable? I cant find a clear example of how to do this. The parameter length is different from the original model now that there is a combined variable, correct?

Thank you very much, I'd be lost in the dark without this mailing list.

Colin

On Mon, Jan 16, 2012 at 12:58 AM, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be> wrote:
> Dear Colin,
>
> I don't think your Tukey tests are very informative. Because you are testing main effects how have an interaction with another variable. So you are not testing the overall effect of a variable but the effect when all other variable are 0 (continuous) or at the reference level (factor).
>
> A better way of doing this could bet o create a new variable which is the interaction between wsh and rip and use that in your model instead of wsh * rip. Then you can use glht() with user defined contrasts so that the contrasts test what you want to test.
>
> Best regards,
>
> Thierry
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature 
> and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality 
> Assurance Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org 
> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Colin Wahl
> Verzonden: zondag 15 januari 2012 1:23
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] Comparing results from glmer and glht
>
> I will try to make this concise.
>
> Background: I am testing the effects of land use and forested riparian buffers on stream invertebrates and in-stream variables. There are 4 watershed types (defined by 4 types of land use) and two riparian types (forested and non). Percent EPT (relative abundance) was my main response variable. I also measured a variety of in-stream variables like temperature, nutrients, and toxicity. There are 72 observations for invertebrates, and 24 for in-stream variables.
>
> I am curious of how acceptable p values are from pairwise comparisons 
> using glht() from the multcomp package
>
> I used glmer with a binomial error structure and an observation-level random effect (to account for overdispersion), to model invertebrates:
>
> modelEPT<-glmer(EPT ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs), 
> data=ept, family=binomial(link="logit"))
>
>   AIC   BIC logLik deviance
>  284.4 309.5 -131.2    262.4
> Random effects:
>  Groups     Name        Variance Std.Dev.
>  obs        (Intercept) 0.30186  0.54942
>  stream:rip (Intercept) 0.40229  0.63427
>  stream     (Intercept) 0.12788  0.35760 Number of obs: 72, groups: 
> obs, 72; stream:rip, 24; stream, 12
>
> Fixed effects:
>                 Estimate Std. Error z value Pr(>|z|)
> (Intercept)   -4.2906     0.4935   -8.694  < 2e-16 *** wshd           
> -2.0557     0.7601  -2.705 0.00684 ** wshf            3.3575     
> 0.6339   5.297  1.18e-07 *** wshg           3.3923     0.7486    4.531  
> 5.86e-06 *** ripN             0.1425     0.6323   0.225  0.82165 
> wshd:ripN     0.3708     0.9682   0.383  0.70170 wshf:ripN    -0.8665     
> 0.8087   -1.071  0.28400 wshg:ripN    -3.1530     0.9601  -3.284  
> 0.00102 **
> ---
>
> Correlation of Fixed Effects:
>                 (Intr)  wshd   wshf   wshg   ripN   wshd:N wshf:N wshd        
> -0.649 wshf        -0.779  0.505 wshg        -0.659  0.428  0.513 ripN         
> -0.644  0.418  0.501  0.424 wshd:ripN  0.421 -0.672 -0.327 -0.277 
> -0.653 wshf:ripN  0.503 -0.327 -0.638 -0.332 -0.782  0.511 wshg:ripN  
> 0.424 -0.275 -0.330 -0.632 -0.659  0.430  0.515
>
>
> I then used this model to do Tukey's HSD contrasts between watershed types:
>
> summary(glht(modelEPT, linfct=mcp(wsh="Tukey"))) Linear Hypotheses:
>
>                Estimate Std. Error z value Pr(>|z|) d - c == 0 
> -2.05573    0.76010  -2.705   0.0341 * f - c == 0  3.35753    0.63386   
> 5.297   <0.001 *** g - c == 0  3.39231    0.74862   4.531   <0.001 *** 
> f - d == 0  5.41326    0.70176   7.714   <0.001 *** g - d == 0  
> 5.44804    0.80692   6.752   <0.001 *** g - f == 0  0.03479    0.68931   
> 0.050   1.0000
>
> and riparian types:
>
>                                                          Estimate Std. 
> Error z value Pr(>|z|)
> C: Forested vs. Non-Forested == 0         0.1425     0.6323   0.225  
> 0.99999
> D: Forested vs. Non-Forested == 0         0.5134     0.7332   0.700  
> 0.98659
> F: Forested vs. Non-Forested == 0        -0.7239     0.5042  -1.436  
> 0.69625
> G: Forested vs. Non-Forested == 0        -3.0105     0.7225  -4.167  < 
> 0.001 ***
>
> Are these p values accurate? Or is that a personal judgement I have to make based on the clarity of the patterns they reflect?
>
> I've shown these results in my figures and explained them in my results. I've basically explained that though these p values reasonably reflect patterns in my data, effects sizes, and variances, that they are inexact and potentially anti-conservative due to the issues with degrees of freedom in mixed models.
>
> >From what I understand from my research in the last year is that
> Douglas Bates and others advocate something of a paradigm shift away from the petagogically reinforced reliance on cryptic p values toward more in depth discussions of effects sizes and variances. The use of MCMC sampling and HPD intervals are suggested, but these are not available for generalized models.
>
> I am interested in publishing these results as an ecologist, not a statistician (pardon the somewhat artificial distinction), and, I am very interested in what kind of a discussion the statisticians and ecologists of the r-sig-mixed-models mailing list would like to see as potential reviewers.
>
> Thank you,
>
> Colin Wahl
>
> M.S. candidate,
> Dept. of Biology
> Western Washington University
> Bellingham, WA
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models