# [R-sig-ME] using mixed model vs just using plot means

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Wed Feb 23 16:12:29 CET 2011

```Dear Derek,

Your problem is due to the small sample size. Therefore I would not trust the results you get from either lme() nor aov().
Take a look at the example below.

Best regards,

Thierry

library(nlme)
TE <- c(0, 10)
sdPlot <- 1
sdNoise <- 10
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(3)), Plant = factor(seq_len(2)))
dataset\$Plot <- dataset\$Treatment:dataset\$Plot

tmp <- replicate(1000, {
dataset\$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset\$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))

#The p-value of the lme model is always equal to or greater than the p-value of the aov model.

#Let's increase the number of plots to ten. Notice that the difference in p-value is much smaller

dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(2)))
dataset\$Plot <- dataset\$Treatment:dataset\$Plot

tmp <- replicate(1000, {
dataset\$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset\$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))

#Increasing both the number of plots and the number of plants per plot eliminates the difference in p-value.

dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(20)))
dataset\$Plot <- dataset\$Treatment:dataset\$Plot

tmp <- replicate(1000, {
dataset\$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset\$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org
> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens
> Derek Sonderegger
> Verzonden: woensdag 23 februari 2011 0:46
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] using mixed model vs just using plot means
>
> I am trying to fully understand the trade-offs between the
> two analyses and if one method is superior in the case of
> only two observations per random effect.
>
> Full experimental set-up: I have 6 field plots and three of
> them receive a treatment (applied to the plot) and three
> serve as controls.  In each plot, I have measurements from 2
> individual plants.  Since this is a desert ecosystem, the
> variance within a plot is quite large compared the the
> variance from plot to plot.
>
> The 'old-school' analysis method would be to calculate plot
> average for each plot and do an ANOVA using those plot means
> as data and accept that my inference is only at the plot
> level.  If I go with a fancy mixed model approach, my
> inference is on the level of an individual plant, and I get
> an estimate of within plot variability.
>
> But the rub is this:  the plot means approach gives me a
> fairly significant treatment effect (p = .014) while the
> mixed model approach gives me a non-significant result
> (p=.34).  My feeling is that trying to estimate the random
> effect (in addition to the treatment effect) is taking a
> large amount of our statistical power in a relatively data
> poor problem.  I did a quick simulation study and it appears
> that in the case of high within plot variability and a small
> number of observations per plot, the mixed model approach has
> substantially reduced power compared to the plot means
>
> >From a scientific point of view, we actually don't mind making
> >inferences at
> the plot level as we are primarily concerned with landscape
> effects of the treatment.
>
> At this point I think that the plot means approach is the
> best choice, but I want to make sure that I'm not missing anything.
>
> Derek Sonderegger
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

```