[R-sig-ME] using mixed model vs just using plot means

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Wed Feb 23 16:12:29 CET 2011


Dear Derek,

Your problem is due to the small sample size. Therefore I would not trust the results you get from either lme() nor aov().
Take a look at the example below.

Best regards,

Thierry

library(nlme)
TE <- c(0, 10)
sdPlot <- 1
sdNoise <- 10
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(3)), Plant = factor(seq_len(2)))
dataset$Plot <- dataset$Treatment:dataset$Plot

tmp <- replicate(1000, {
	dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
	c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
	anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))

#The p-value of the lme model is always equal to or greater than the p-value of the aov model.

#Let's increase the number of plots to ten. Notice that the difference in p-value is much smaller

dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(2)))
dataset$Plot <- dataset$Treatment:dataset$Plot

tmp <- replicate(1000, {
	dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
	c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
	anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))

#Increasing both the number of plots and the number of plants per plot eliminates the difference in p-value.

dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(20)))
dataset$Plot <- dataset$Treatment:dataset$Plot

tmp <- replicate(1000, {
	dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
	c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
	anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
  

> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org 
> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens 
> Derek Sonderegger
> Verzonden: woensdag 23 februari 2011 0:46
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] using mixed model vs just using plot means
> 
> I am trying to fully understand the trade-offs between the 
> two analyses and if one method is superior in the case of 
> only two observations per random effect.
> 
> Full experimental set-up: I have 6 field plots and three of 
> them receive a treatment (applied to the plot) and three 
> serve as controls.  In each plot, I have measurements from 2 
> individual plants.  Since this is a desert ecosystem, the 
> variance within a plot is quite large compared the the 
> variance from plot to plot.
> 
> The 'old-school' analysis method would be to calculate plot 
> average for each plot and do an ANOVA using those plot means 
> as data and accept that my inference is only at the plot 
> level.  If I go with a fancy mixed model approach, my 
> inference is on the level of an individual plant, and I get 
> an estimate of within plot variability.
> 
> But the rub is this:  the plot means approach gives me a 
> fairly significant treatment effect (p = .014) while the 
> mixed model approach gives me a non-significant result 
> (p=.34).  My feeling is that trying to estimate the random 
> effect (in addition to the treatment effect) is taking a 
> large amount of our statistical power in a relatively data 
> poor problem.  I did a quick simulation study and it appears 
> that in the case of high within plot variability and a small 
> number of observations per plot, the mixed model approach has 
> substantially reduced power compared to the plot means
> 
> >From a scientific point of view, we actually don't mind making 
> >inferences at
> the plot level as we are primarily concerned with landscape 
> effects of the treatment.
> 
> At this point I think that the plot means approach is the 
> best choice, but I want to make sure that I'm not missing anything.
> 
> Derek Sonderegger
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 



More information about the R-sig-mixed-models mailing list