[R] Unbalanced data in split-plot analysis with aov()

Wed Oct 11 00:07:11 CEST 2017

Dear all,

I'm analysing a split-plot experiment, where there are sometimes one or 
two values missing. I realized that if the data is slightly unbalanced, 
the effect of the subplot-treatment will also appear and be tested 
against the mainplot-error term.

I replicated this with the Oats dataset from Yates (1935), contained in 
the nlme package, where Variety is on mainplot, and nitro on subplot.

 > # Oats dataset (Yates 1935) from the nlme package
 > require(nlme); data <- get(data(Oats))
 > data$nitro <- factor(data$nitro);data$Block <- 
as.factor(as.character(data$Block))
 > nrow(data) # 6 Blocks * 4 N-levels * 3 Varieties = 72 obs -> 
orthogonal and balanced
[1] 72
 > # split-plot anova
 > summary(aov(yield ~ Block+Variety*nitro + Error(Block/Variety),data))

Error: Block
       Df Sum Sq Mean Sq
Block  5  15875    3175

Error: Block:Variety
           Df Sum Sq Mean Sq F value Pr(>F)
Variety    2   1786   893.2   1.485  0.272
Residuals 10   6013   601.3

Error: Within
               Df Sum Sq Mean Sq F value   Pr(>F)
nitro          3  20020    6674  37.686 2.46e-12 ***
Variety:nitro  6    322      54   0.303    0.932
Residuals     45   7969     177
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 > reduceddata <- data[-5,] # delete one observation
 > summary(aov(yield ~ Block+Variety*nitro + 
Error(Block/Variety),reduceddata))

Error: Block
       Df Sum Sq Mean Sq
Block  5  16070    3214

Error: Block:Variety
           Df Sum Sq Mean Sq F value Pr(>F)
Variety    2   1820   910.0   1.373  0.302
nitro      1      1     1.0   0.001  0.970
Residuals  9   5964   662.7

Error: Within
               Df Sum Sq Mean Sq F value   Pr(>F)
nitro          3  19766    6589   36.88 4.49e-12 ***
Variety:nitro  6    333      55    0.31    0.928
Residuals     44   7860     179

Although I fully understand, that mixed-models should be preferred for 
non-orthogonal/unbalanced data, I think it's still valid to use the 
aov() approach, when only 1 or 2 datapoints are missing. Also, I don't 
really understand, why the structure of the ANOVA changes suddenly. Is 
there any argument, I could supply to change this behaviour?

When I do the same with lm() and subsequent anova(), and calculate 
F-value for Variety by hand, the estimates are still quite robust.

Best regards,

Sam

-- 
Samuel Knapp

Lehrstuhl für Pflanzenernährung
Technische Universität München
(Chair of Plant Nutrition
Technical University of Munich)

Emil-Ramann-Strasse 2
D-85354 Freising

Tel. +49 8161 71-3578
samuel.knapp at tum.de
www.researchgate.net/profile/Samuel_Knapp

-- 
Samuel Knapp

Lehrstuhl für Pflanzenernährung
Technische Universität München
(Chair of Plant Nutrition
Technical University of Munich)

Emil-Ramann-Strasse 2
D-85354 Freising

Tel. +49 8161 71-3578	
samuel.knapp at tum.de
www.researchgate.net/profile/Samuel_Knapp

	[[alternative HTML version deleted]]