[R-sig-ME] [R] dispcrepancy between aov F test and tukey contrasts results with mixed effects model

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Mar 18 20:15:03 CET 2009


lbaril at montana.edu wrote:
> Unforunately, the data are highly unbalanced.  I have 2 to 3 sites per
> stand, but observations within a stand range from 4 to 16 so my variances
> in each site will not be similar and so using site averages would not work
> in this case, right? Note that at the stand scale, I have roughly equal
> observations.  Often in ecology balanced designs are not possible
> (although if I'd known this would be an issue I would have made more of an
> effort towards balance), so it seems as though there ought to be a
> solution or others who have experienced this.  I have spent a fair bit of
> time researching this issue, but it seems like all the advice points to
> what I tried originally - which obviously did not work correctly given my
> data.  Is there potentially another solution?


It's not necessarily as bad as you think. The relative sizes of the 
variance components also factor in. If the residual variance is small 
enough, it doesn't really matter how many observations you are 
averaging. In the present case, the two variance components are roughly 
equal, so the relative variances of averages is about (1+1/4):(1+1/16) = 
1.176 between the extreme cases. Ideally, you would weight the averages 
according to their variance, but the efficiency lost by not doing so is 
going to be small. And don't forget that whatever you do, you rely on 
assuming a normal distribution of the site levels, which is  really no 
verifiable at that sample size.

Also, there is some merit to just considering the difference in 
replication as part of the random variation of the averages (are there 
*systematically* more replicates on some stands?).




> 
> 
>> lbaril at montana.edu wrote:
>>> Thanks Peter for the advice and quick response.  I just want to clarify
> what you suggest.  I should average values within a site then do a
> one-way
>>> anova to test for differnces between sites based on the 2 to 3 new samples
>>> per stand type -- and not use random effects for site?  Or, because
> I've
>>> reduced the data I've 'corrected' the problem with the glht multiple
> comparisons and can use the p-values from that summary if I include
> site
>>> as a random effect?   Thanks again for your advice.
>> This is tricky to say in a few lines, but the basic idea of a random
> effects model is that the site averages vary more than they should
> according to within-site variability. In the balanced case (equal number
> of observations per site), it turns out that the mixed-effects analysis
> is _equivalent_ to modeling the site averages. This is not ignoring the
> random effects of site; rather, it is coalescing it with the residual
> since the variance of a site average is v_site + 1/k v_res where k is
> the number of within-site observations.
>> In the unbalanced case it is not strictly correct to analyze averages,
> because thy have different variances. However, the differences can be
> slight (when the k's are similar or v_site dominates in the above
> formula).
>> A side effect of looking at averages is that you are fitting a plain lm
> model rather than lme and that glht in that case knows how to handle the
> degrees of freedom adjustment. (Assuming that the averages are normally
> distributed, which is as always dubious; but presumably, it is better
> than not correcting at all.)
>>
>>>> lbaril at montana.edu wrote:
>>>>> Hello,
>>>>> I have some conflicting output from an aov summary and tukey
> contrasts
>>> with a mixed effects model I was hoping someone could clarify.  I am
> comparing the abundance of a species across three willow stand types.
> Since I have 2 or 3 sites within a habitat I have included site as a
> random effect in the lme model.  My confusion is that the F test given
> by
>>>>> aov(model) indicates there is no difference between habitats, but the
>>> tukey contrasts using the multcomp package shows that one pair of habits
>>>>> is significantly different from each other.  Why is there a
>>> discrepancy?
>>>>> Have I specified my model correctly?  I included the code and output
>>> below.  Thank you.
>>>> Looks like glht() is ignoring degrees of freedom. So what it does is
>>> wrong but it is not easy to do it right (whatever "right" is in these
> cases). If I understand correctly, what you have is that "stand" is
> strictly coarser than "site", presumably the stands representing each
> 2,
>>> 2, and 3 sites, with a varying number of replications within each site.
> Since the between-site variation is considered random, you end up with
> a
>>> comparison of stands based on essentially only 7 pieces of information.
> (The latter statement requires some qualification, but let's not go
> there
>>> to day.)
>>>> If you have roughly equal replications within each site, I'd be strongly
>>> tempted to reduce the analysis to a simple 1-way ANOVA of the site
> averages.
>>>>>> co.lme=lme(coye~stand,data=t,random=~1|site)
>>>>>> summary (co.lme)
>>>>> Linear mixed-effects model fit by REML
>>>>>  Data: R
>>>>>        AIC      BIC    logLik
>>>>>   53.76606 64.56047 -21.88303
>>>>> Random effects:
>>>>>  Formula: ~1 | site
>>>>>         (Intercept)  Residual
>>>>> StdDev:   0.3122146 0.2944667
>>>>> Fixed effects: coye ~ stand
>>>>>                  Value Std.Error DF    t-value p-value
>>>>> (Intercept)  0.4936837 0.2305072 60  2.1417277  0.0363
>>>>> stand2       0.4853222 0.3003745  4  1.6157240  0.1815
>>>>> stand3      -0.3159230 0.3251201  4 -0.9717117  0.3862
>>>>>  Correlation:
>>>>>        (Intr) stand2
>>>>> stand2 -0.767
>>>>> stand3 -0.709  0.544
>>>>> Standardized Within-Group Residuals:
>>>>>        Min         Q1        Med         Q3        Max
>>>>> -2.4545673 -0.5495609 -0.3148274  0.7527378  2.5151476
>>>>> Number of Observations: 67
>>>>> Number of Groups: 7
>>>>>> anova(co.lme)
>>>>>             numDF denDF   F-value p-value
>>>>> (Intercept)     1    60 23.552098  <.0001
>>>>> stand           2     4  3.738199  0.1215
>>>>>> summary(glht(co.lme,linfct=mcp(stand="Tukey")))
>>>>>          Simultaneous Tests for General Linear Hypotheses
>>>>> Multiple Comparisons of Means: Tukey Contrasts
>>>>> Fit: lme.formula(fixed = coye ~ stand, data = R, random = ~1 | site)
>>> Linear Hypotheses:
>>>>>            Estimate Std. Error z value Pr(>|z|)
>>>>> 2 - 1 == 0   0.4853     0.3004   1.616   0.2385
>>>>> 3 - 1 == 0  -0.3159     0.3251  -0.972   0.5943
>>>>> 3 - 2 == 0  -0.8012     0.2994  -2.676   0.0202 *
>>>>> ---
>>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Adjusted p values reported -- single-step method)
>>>>> Lisa Baril
>>>>> Masters Candidate
>>>>> Department of Ecology
>>>>> Montana State University - Bozeman
>>>>> 406.994.2670
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> --
>>>>     O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>>>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>>>   (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
>>> 35327918
>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> Lisa Baril
>>> Masters Candidate
>>> Department of Ecology
>>> Montana State University - Bozeman
>>> 406.994.2670
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>>     O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>   (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
> 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>>
>>
>>
> 
> 
> Lisa Baril
> Masters Candidate
> Department of Ecology
> Montana State University - Bozeman
> 406.994.2670
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907




More information about the R-sig-mixed-models mailing list