[R-sig-ME] using mixed model vs just using plot means
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Wed Feb 23 16:12:29 CET 2011
Dear Derek,
Your problem is due to the small sample size. Therefore I would not trust the results you get from either lme() nor aov().
Take a look at the example below.
Best regards,
Thierry
library(nlme)
TE <- c(0, 10)
sdPlot <- 1
sdNoise <- 10
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(3)), Plant = factor(seq_len(2)))
dataset$Plot <- dataset$Treatment:dataset$Plot
tmp <- replicate(1000, {
dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))
#The p-value of the lme model is always equal to or greater than the p-value of the aov model.
#Let's increase the number of plots to ten. Notice that the difference in p-value is much smaller
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(2)))
dataset$Plot <- dataset$Treatment:dataset$Plot
tmp <- replicate(1000, {
dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))
#Increasing both the number of plots and the number of plants per plot eliminates the difference in p-value.
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(20)))
dataset$Plot <- dataset$Treatment:dataset$Plot
tmp <- replicate(1000, {
dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))
----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium
Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org
> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens
> Derek Sonderegger
> Verzonden: woensdag 23 februari 2011 0:46
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] using mixed model vs just using plot means
>
> I am trying to fully understand the trade-offs between the
> two analyses and if one method is superior in the case of
> only two observations per random effect.
>
> Full experimental set-up: I have 6 field plots and three of
> them receive a treatment (applied to the plot) and three
> serve as controls. In each plot, I have measurements from 2
> individual plants. Since this is a desert ecosystem, the
> variance within a plot is quite large compared the the
> variance from plot to plot.
>
> The 'old-school' analysis method would be to calculate plot
> average for each plot and do an ANOVA using those plot means
> as data and accept that my inference is only at the plot
> level. If I go with a fancy mixed model approach, my
> inference is on the level of an individual plant, and I get
> an estimate of within plot variability.
>
> But the rub is this: the plot means approach gives me a
> fairly significant treatment effect (p = .014) while the
> mixed model approach gives me a non-significant result
> (p=.34). My feeling is that trying to estimate the random
> effect (in addition to the treatment effect) is taking a
> large amount of our statistical power in a relatively data
> poor problem. I did a quick simulation study and it appears
> that in the case of high within plot variability and a small
> number of observations per plot, the mixed model approach has
> substantially reduced power compared to the plot means
>
> >From a scientific point of view, we actually don't mind making
> >inferences at
> the plot level as we are primarily concerned with landscape
> effects of the treatment.
>
> At this point I think that the plot means approach is the
> best choice, but I want to make sure that I'm not missing anything.
>
> Derek Sonderegger
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list