[R-sig-ME] Comparing results from glmer and glht
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Tue Jan 17 15:28:00 CET 2012
Hi Colin,
This is what I would do. The example is on a simple lm() but is similar to a mixed model.
I find it easier to define contrasts on models without the intercept.
set.seed(12345)
dataset <- expand.grid(wsh = c("C", "D", "F", "G"), rip = c("Forest", "Non-forest"), id = seq_len(10))
dataset$wshrip <- with(dataset, wsh:rip)
dataset$Y <- runif(4, min = -10, max = 10)[dataset$wsh] + runif(2, min = -2, max = 2)[dataset$rip] + runif(8, min = -1, max = 1)[dataset$wshrip] + rnorm(nrow(dataset))
model <- lm(Y ~ 0 + wshrip, data = dataset)
library(multcomp)
K <- rbind(
c(1, 1, -1, -1, 0, 0, 0, 0),
c(0, 0, 1, 1, 0, 0, -1, -1),
c(1, -1, 1, -1, 1, -1, 1, -1))
rownames(K) <- c("C - D = 0", "D - G = 0", "Forest - non-forest = 0")
summary(glht(model, K))
Best regards,
Thierry
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: Colin Wahl [mailto:biowahl at gmail.com]
Verzonden: maandag 16 januari 2012 21:55
Aan: ONKELINX, Thierry; r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] Comparing results from glmer and glht
Thierry,
Thank you for the suggestion. I think I understand, though creating user defined contrasts will be difficult.
I create a combined variable:
wshrip <-c bind(wsh, rip)
Resulting in a new model
modelEPT<-glmer(EPT ~ wshrip + (1|stream) + (1|stream:rip) + (1|obs), data=ept, family=binomial(link="logit"))
The user defined contrasts I tried previously were:
wsh <- rbind("C vs. D" = c(1,0,0,0,0,0,0,0),
"C vs. F" = c(0,1,0,0,0,0,0,0),
"C vs. G" = c(0,0,1,0,0,0,0,0),
"D vs. F" = c(-1,1,0,0,0,0,0,0),
"D vs. G" = c(-1,0,1,0,0,0,0,1),
"F vs. G" = c(0,-1,1,0,0,0,0,0))
This did not give me reasonable results, prompting me to use built-in
contrasts:
summary(glht(modelEPT, linfct=mcp(wsh="Tukey")))
For riparian contrasts I used:
rip <- rbind("C: Forested vs. Non-Forested" = c(0,0,0,0,1,0,0,0),
"D: Forested vs. Non-Forested" = c(0,0,0,0,1,1,0,0),
"F: Forested vs. Non-Forested" = c(0,0,0,0,1,0,1,0),
"G: Forested vs. Non-Forested" = c(0,0,0,0,1,0,0,1))
I dont understand how these contrasts are defined, but based them on contrasts on an almost identical design found here:
http://thebiobucket.blogspot.com/2011/06/glmm-with-custom-multiple-comparisons.html#more
Could you please explain how to use user-defined for the model with a wshrip combined variable? I cant find a clear example of how to do this. The parameter length is different from the original model now that there is a combined variable, correct?
Thank you very much, I'd be lost in the dark without this mailing list.
Colin
On Mon, Jan 16, 2012 at 12:58 AM, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be> wrote:
> Dear Colin,
>
> I don't think your Tukey tests are very informative. Because you are testing main effects how have an interaction with another variable. So you are not testing the overall effect of a variable but the effect when all other variable are 0 (continuous) or at the reference level (factor).
>
> A better way of doing this could bet o create a new variable which is the interaction between wsh and rip and use that in your model instead of wsh * rip. Then you can use glht() with user defined contrasts so that the contrasts test what you want to test.
>
> Best regards,
>
> Thierry
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
> Assurance Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org
> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Colin Wahl
> Verzonden: zondag 15 januari 2012 1:23
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] Comparing results from glmer and glht
>
> I will try to make this concise.
>
> Background: I am testing the effects of land use and forested riparian buffers on stream invertebrates and in-stream variables. There are 4 watershed types (defined by 4 types of land use) and two riparian types (forested and non). Percent EPT (relative abundance) was my main response variable. I also measured a variety of in-stream variables like temperature, nutrients, and toxicity. There are 72 observations for invertebrates, and 24 for in-stream variables.
>
> I am curious of how acceptable p values are from pairwise comparisons
> using glht() from the multcomp package
>
> I used glmer with a binomial error structure and an observation-level random effect (to account for overdispersion), to model invertebrates:
>
> modelEPT<-glmer(EPT ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
> data=ept, family=binomial(link="logit"))
>
> AIC BIC logLik deviance
> 284.4 309.5 -131.2 262.4
> Random effects:
> Groups Name Variance Std.Dev.
> obs (Intercept) 0.30186 0.54942
> stream:rip (Intercept) 0.40229 0.63427
> stream (Intercept) 0.12788 0.35760 Number of obs: 72, groups:
> obs, 72; stream:rip, 24; stream, 12
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -4.2906 0.4935 -8.694 < 2e-16 *** wshd
> -2.0557 0.7601 -2.705 0.00684 ** wshf 3.3575
> 0.6339 5.297 1.18e-07 *** wshg 3.3923 0.7486 4.531
> 5.86e-06 *** ripN 0.1425 0.6323 0.225 0.82165
> wshd:ripN 0.3708 0.9682 0.383 0.70170 wshf:ripN -0.8665
> 0.8087 -1.071 0.28400 wshg:ripN -3.1530 0.9601 -3.284
> 0.00102 **
> ---
>
> Correlation of Fixed Effects:
> (Intr) wshd wshf wshg ripN wshd:N wshf:N wshd
> -0.649 wshf -0.779 0.505 wshg -0.659 0.428 0.513 ripN
> -0.644 0.418 0.501 0.424 wshd:ripN 0.421 -0.672 -0.327 -0.277
> -0.653 wshf:ripN 0.503 -0.327 -0.638 -0.332 -0.782 0.511 wshg:ripN
> 0.424 -0.275 -0.330 -0.632 -0.659 0.430 0.515
>
>
> I then used this model to do Tukey's HSD contrasts between watershed types:
>
> summary(glht(modelEPT, linfct=mcp(wsh="Tukey"))) Linear Hypotheses:
>
> Estimate Std. Error z value Pr(>|z|) d - c == 0
> -2.05573 0.76010 -2.705 0.0341 * f - c == 0 3.35753 0.63386
> 5.297 <0.001 *** g - c == 0 3.39231 0.74862 4.531 <0.001 ***
> f - d == 0 5.41326 0.70176 7.714 <0.001 *** g - d == 0
> 5.44804 0.80692 6.752 <0.001 *** g - f == 0 0.03479 0.68931
> 0.050 1.0000
>
> and riparian types:
>
> Estimate Std.
> Error z value Pr(>|z|)
> C: Forested vs. Non-Forested == 0 0.1425 0.6323 0.225
> 0.99999
> D: Forested vs. Non-Forested == 0 0.5134 0.7332 0.700
> 0.98659
> F: Forested vs. Non-Forested == 0 -0.7239 0.5042 -1.436
> 0.69625
> G: Forested vs. Non-Forested == 0 -3.0105 0.7225 -4.167 <
> 0.001 ***
>
> Are these p values accurate? Or is that a personal judgement I have to make based on the clarity of the patterns they reflect?
>
> I've shown these results in my figures and explained them in my results. I've basically explained that though these p values reasonably reflect patterns in my data, effects sizes, and variances, that they are inexact and potentially anti-conservative due to the issues with degrees of freedom in mixed models.
>
> >From what I understand from my research in the last year is that
> Douglas Bates and others advocate something of a paradigm shift away from the petagogically reinforced reliance on cryptic p values toward more in depth discussions of effects sizes and variances. The use of MCMC sampling and HPD intervals are suggested, but these are not available for generalized models.
>
> I am interested in publishing these results as an ecologist, not a statistician (pardon the somewhat artificial distinction), and, I am very interested in what kind of a discussion the statisticians and ecologists of the r-sig-mixed-models mailing list would like to see as potential reviewers.
>
> Thank you,
>
> Colin Wahl
>
> M.S. candidate,
> Dept. of Biology
> Western Washington University
> Bellingham, WA
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list