[R-sig-ME] Comparing results from glmer and glht

Mon Jan 16 21:14:56 CET 2012

Good. This is the same conclusion I've made with a boxplot/graph of
the EPT data. The pattern is very clear. However, if not using p
values, how do you conclude that the differences are "significant?"
The patterns are less clear with other variables.

For example, nutrients, are clearly higher in cultivated streams, but
doing mcmc sampling and using HPD intervals to calculate 95%
confidence intervals results in huge conf. intervals. I've discussed
this with Thierry on this list before, where he asserted that with an
n of 24, the model lacks the power to find significance. However, a
child could look at the graph and tell you that nutrients are way
higher in cultivated streams. Also, the variance/st. deviation is zero
for the random stream variable. Does that mean it should be excluded?
Here are the results of the Nitrate/nitrite model, where nitrates are
higher in non-forested reaches in cultivated streams:

Linear mixed model fit by REML
Formula: LN ~ wsh * rip + (1 | stream)
   Data: all24
   AIC   BIC logLik deviance REMLdev
 43.89 55.24 -11.94    14.12   23.89
Random effects:
 Groups   Name        Variance Std.Dev.
 stream   (Intercept) 0.00000  0.00000
 Residual             0.16589  0.40729
Number of obs: 23, groups: stream, 12

Fixed effects:
            Estimate Std. Error t value
(Intercept)   0.5726     0.2351   2.435
wshD         -0.1040     0.3326  -0.313
wshF         -0.2562     0.3326  -0.770
wshG         -0.2758     0.3718  -0.742
ripN          0.8442     0.3326   2.538
wshD:ripN    -0.6730     0.4703  -1.431
wshF:ripN    -0.7653     0.4554  -1.681
wshG:ripN    -1.1080     0.5258  -2.107

Correlation of Fixed Effects:
          (Intr) wshD   wshF   wshG   ripN   wshD:N wshF:N
wshD      -0.707
wshF      -0.707  0.500
wshG      -0.632  0.447  0.447
ripN      -0.707  0.500  0.500  0.447
wshD:ripN  0.500 -0.707 -0.354 -0.316 -0.707
wshF:ripN  0.516 -0.365 -0.730 -0.327 -0.730  0.516
wshG:ripN  0.447 -0.316 -0.316 -0.707 -0.632  0.447  0.462

On Mon, Jan 16, 2012 at 3:06 AM, i white <i.m.s.white at ed.ac.uk> wrote:
> Colin,
>
> A simple graph of means for your eight combinations of 4 watershed types and
> two riparian types shows there are significant differences between watershed
> types c, d and f, and that for those types of watershed, riparian type makes
> no difference to the response. For watershed g, response is similar to
> watershed f for forested streams, but significantly lower at non-forested
> streams (if I have interpreted your factor labels correctly). These
> conclusions are based on inspection of a very simple graph with say a white
> dot for forested stream, black dot for unforested stream, and different
> locations along x axis for the 4 watershed types. Somewhere on the graph
> there needs to be a bar whose length represents a rough estimate of the
> average standard error of a mean (or difference between two means). Note
> that neither p-values nor multiple comparisons have been mentioned.
>
> Colin Wahl wrote:
>>
>> I will try to make this concise.
>>
>> Background: I am testing the effects of land use and forested riparian
>> buffers on stream invertebrates and in-stream variables. There are 4
>> watershed types (defined by 4 types of land use) and two riparian
>> types (forested and non). Percent EPT (relative abundance) was my main
>> response variable. I also measured a variety of in-stream variables
>> like temperature, nutrients, and toxicity. There are 72 observations
>> for invertebrates, and 24 for in-stream variables.
>>
>> I am curious of how acceptable p values are from pairwise comparisons
>> using glht() from the multcomp package
>>
>> I used glmer with a binomial error structure and an observation-level
>> random effect (to account for overdispersion), to model invertebrates:
>>
>> modelEPT<-glmer(EPT ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
>> data=ept, family=binomial(link="logit"))
>>
>>   AIC   BIC logLik deviance
>>  284.4 309.5 -131.2    262.4
>> Random effects:
>>  Groups     Name        Variance Std.Dev.
>>  obs        (Intercept) 0.30186  0.54942
>>  stream:rip (Intercept) 0.40229  0.63427
>>  stream     (Intercept) 0.12788  0.35760
>> Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>>
>> Fixed effects:
>>                 Estimate Std. Error z value Pr(>|z|)
>> (Intercept)   -4.2906     0.4935   -8.694  < 2e-16 ***
>> wshd           -2.0557     0.7601  -2.705 0.00684 **
>> wshf            3.3575     0.6339   5.297  1.18e-07 ***
>> wshg           3.3923     0.7486    4.531  5.86e-06 ***
>> ripN             0.1425     0.6323   0.225  0.82165
>> wshd:ripN     0.3708     0.9682   0.383  0.70170
>> wshf:ripN    -0.8665     0.8087   -1.071  0.28400
>> wshg:ripN    -3.1530     0.9601  -3.284  0.00102 **
>> ---
>>
>> Correlation of Fixed Effects:
>>                 (Intr)  wshd   wshf   wshg   ripN   wshd:N wshf:N
>> wshd        -0.649
>> wshf        -0.779  0.505
>> wshg        -0.659  0.428  0.513
>> ripN         -0.644  0.418  0.501  0.424
>> wshd:ripN  0.421 -0.672 -0.327 -0.277 -0.653
>> wshf:ripN  0.503 -0.327 -0.638 -0.332 -0.782  0.511
>> wshg:ripN  0.424 -0.275 -0.330 -0.632 -0.659  0.430  0.515
>>
>>
>> I then used this model to do Tukey's HSD contrasts between watershed
>> types:
>>
>> summary(glht(modelEPT, linfct=mcp(wsh="Tukey")))
>> Linear Hypotheses:
>>
>>                Estimate Std. Error z value Pr(>|z|)
>> d - c == 0 -2.05573    0.76010  -2.705   0.0341 *
>> f - c == 0  3.35753    0.63386   5.297   <0.001 ***
>> g - c == 0  3.39231    0.74862   4.531   <0.001 ***
>> f - d == 0  5.41326    0.70176   7.714   <0.001 ***
>> g - d == 0  5.44804    0.80692   6.752   <0.001 ***
>> g - f == 0  0.03479    0.68931   0.050   1.0000
>>
>> and riparian types:
>>
>>                                                          Estimate
>> Std. Error z value Pr(>|z|)
>> C: Forested vs. Non-Forested == 0         0.1425     0.6323   0.225
>>  0.99999
>> D: Forested vs. Non-Forested == 0         0.5134     0.7332   0.700
>>  0.98659
>> F: Forested vs. Non-Forested == 0        -0.7239     0.5042  -1.436
>>  0.69625
>> G: Forested vs. Non-Forested == 0        -3.0105     0.7225  -4.167  <
>> 0.001 ***
>>
>> Are these p values accurate? Or is that a personal judgement I have to
>> make based on the clarity of the patterns they reflect?
>>
>> I've shown these results in my figures and explained them in my
>> results. I've basically explained that though these p values
>> reasonably reflect patterns in my data, effects sizes, and variances,
>> that they are inexact and potentially anti-conservative due to the
>> issues with degrees of freedom in mixed models.
>>
>>> From what I understand from my research in the last year is that
>>
>> Douglas Bates and others advocate something of a paradigm shift away
>> from the petagogically reinforced reliance on cryptic p values toward
>> more in depth discussions of effects sizes and variances. The use of
>> MCMC sampling and HPD intervals are suggested, but these are not
>> available for generalized models.
>>
>> I am interested in publishing these results as an ecologist, not a
>> statistician (pardon the somewhat artificial distinction), and, I am
>> very interested in what kind of a discussion the statisticians and
>> ecologists of the r-sig-mixed-models mailing list would like to see as
>> potential reviewers.
>>
>> Thank you,
>>
>> Colin Wahl
>>
>> M.S. candidate,
>> Dept. of Biology
>> Western Washington University
>> Bellingham, WA
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>