[R] ANOVA 1 too few degrees of freedom

Thu May 5 20:55:03 CEST 2011

>>> Rovinpiper <david.j.meehan at gmail.com> 04/05/2011 22:43 >>>
>So this seems to indicate that I have what I want. I have two
>respiration data points at each plot on each day.

Yes; if you had only Plot+Day you'd have a completely balanced full
factorial ... for Plot and Day.

But I think I now see an answer to your puzzlement. 

Your original ANOVA table was 
                                                              Df  Sum
Sq    Mean Sq    F value           Pr(>F) 

Combined.Trt                                           1   52.80      
52.805     2.0186e+30  < 2.2e-16 *** 
as.factor(Combined.Plot)                         10  677.69    67.769  
  2.5907e+30  < 2.2e-16 *** 
as.factor(Combined.Day)                         16  2817.47  176.092  
6.7317e+30  < 2.2e-16 *** 
Combined.Trt:as.factor(Combined.Day)   16   47.82      2.989      
1.1426e+29  < 2.2e-16 *** 
as.factor(Combined.Day):Combined.Plot 160  611.21   3.820 1.4604e+29 <
2.2e-16 *** 

Residuals                                                 204    0.00  
0.000 

Now, your Day:Plot combination has 17*12=204 combinations. And you are
showing 204 residual DOF, which is exactly right for two observations
per group and 204 groups. That's exactly right for the model
Day+Plot+Day:Plot.

But you have Trt in the model as well. Where are the Treatment DoF
coming from? 
Clearly, the treatment DoF has not come out of the residual term. That
means the treatment DoF must have come from somewhere else. Since you've
lost 2 instead of 1 DoF from Plot and kept your n[Day]-1 degrees of
freedom for Day, I would guess that Plot is nested in Trt. If that is
the case, you'd 'obviously' expect n[Plot]-2 dof for Plot. And indeed i
could replicate your ANOVA table DoF with 

Trt<-gl(2,204)
Plot<-gl(12,34) #Note that Plot is NOT fully crossed with Trt; each Trt
has only 6 Plots
Day <- gl(17,2,408)
y<-rnorm(408)
anova(lm(y~Trt+Plot+Day+Trt:Day+Day:Plot))

So that would be one data structure that would explain your DoF.

But if that is indeed the structure, there are other concerns that you
may need to look out for. 
First, compare the two models
anova(lm(y~Trt+Plot+Day+Trt:Day+Day:Plot))
#and
anova(lm(y~Trt+Plot+Day+Day:Plot+Trt:Day))

Same terms in the model specification, but different DoF and Trt:Day
has completely vanished from the second ANOVA table. Something is
clearly either aliased, unbalanced or incorrectly specified. In my
presumed version of events, because the Plot is nested in Trt, Day:Plot
is also nested in Trt. To get _consistent_ results independent of model
order you would need to reflect that additional nesting in the model:

anova(lm(y~Trt+Plot+Day+Trt:Day:Plot+Trt:Day))
#vs
anova(lm(y~Trt+Plot+Day+Trt:Day+Trt:Day:Plot))

and this time, the two models give identical results for all rows in
the table. Somewhat reassuringly 
anova(lm(y~Day+Trt/Plot+Trt:Day+Trt:Day:Plot))
and even more reassuringly car's Anova does too.

But there's another more fundamental thing that may suggest this is
still not quite the right thing to do. If Plot is nested in Trt, it
matters whether plot is random or fixed when asking about the treatment
effect. I'd guess Plot is random. If plot is random and nested in Trt it
would not be appropriate to simply compare any calculated Trt MS with
the residual MS. Rather, one would compare the Trt MS with the Trt:Plot
interaction. Or perhaps resort to a mixed effects model using lme (if
Plot is the only random term) or lmer if Plot and Day are both random. 

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}