[R] Unbalanced data in split-plot analysis with aov()
Samuel Knapp
samuel.knapp at tum.de
Wed Oct 11 00:07:11 CEST 2017
Dear all,
I'm analysing a split-plot experiment, where there are sometimes one or
two values missing. I realized that if the data is slightly unbalanced,
the effect of the subplot-treatment will also appear and be tested
against the mainplot-error term.
I replicated this with the Oats dataset from Yates (1935), contained in
the nlme package, where Variety is on mainplot, and nitro on subplot.
> # Oats dataset (Yates 1935) from the nlme package
> require(nlme); data <- get(data(Oats))
> data$nitro <- factor(data$nitro);data$Block <-
as.factor(as.character(data$Block))
> nrow(data) # 6 Blocks * 4 N-levels * 3 Varieties = 72 obs ->
orthogonal and balanced
[1] 72
> # split-plot anova
> summary(aov(yield ~ Block+Variety*nitro + Error(Block/Variety),data))
Error: Block
Df Sum Sq Mean Sq
Block 5 15875 3175
Error: Block:Variety
Df Sum Sq Mean Sq F value Pr(>F)
Variety 2 1786 893.2 1.485 0.272
Residuals 10 6013 601.3
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
nitro 3 20020 6674 37.686 2.46e-12 ***
Variety:nitro 6 322 54 0.303 0.932
Residuals 45 7969 177
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> reduceddata <- data[-5,] # delete one observation
> summary(aov(yield ~ Block+Variety*nitro +
Error(Block/Variety),reduceddata))
Error: Block
Df Sum Sq Mean Sq
Block 5 16070 3214
Error: Block:Variety
Df Sum Sq Mean Sq F value Pr(>F)
Variety 2 1820 910.0 1.373 0.302
nitro 1 1 1.0 0.001 0.970
Residuals 9 5964 662.7
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
nitro 3 19766 6589 36.88 4.49e-12 ***
Variety:nitro 6 333 55 0.31 0.928
Residuals 44 7860 179
Although I fully understand, that mixed-models should be preferred for
non-orthogonal/unbalanced data, I think it's still valid to use the
aov() approach, when only 1 or 2 datapoints are missing. Also, I don't
really understand, why the structure of the ANOVA changes suddenly. Is
there any argument, I could supply to change this behaviour?
When I do the same with lm() and subsequent anova(), and calculate
F-value for Variety by hand, the estimates are still quite robust.
Best regards,
Sam
--
Samuel Knapp
Lehrstuhl für Pflanzenernährung
Technische Universität München
(Chair of Plant Nutrition
Technical University of Munich)
Emil-Ramann-Strasse 2
D-85354 Freising
Tel. +49 8161 71-3578
samuel.knapp at tum.de
www.researchgate.net/profile/Samuel_Knapp
--
Samuel Knapp
Lehrstuhl für Pflanzenernährung
Technische Universität München
(Chair of Plant Nutrition
Technical University of Munich)
Emil-Ramann-Strasse 2
D-85354 Freising
Tel. +49 8161 71-3578
samuel.knapp at tum.de
www.researchgate.net/profile/Samuel_Knapp
[[alternative HTML version deleted]]
More information about the R-help
mailing list