[R] anova subhypotheses

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Thu Mar 6 12:30:01 CET 2003


"rob foxall (IFR)" <rob.foxall at BBSRC.AC.UK> writes:

> Hello all,
> 
>             A really noddy question for you all: I'm trying without success to do some subhypothesis testing. Using simple anova model, with a toy dataset from a book. I have four factors A,B,C,D, and wish to test mu_C = mu_D. This is what I have tried:
> 
>  
> 
> > contrasts(infants$group,how.many=1) <- c(0,0,1,-1)
> 
> > contrasts(infants$group)
> 
>   [,1]
> 
> A    0
> 
> B    0
> 
> C    1
> 
> D   -1
> 
> > fit <- aov(age~group,data=infants)
> 
> > summary(fit)
> 
>             Df Sum Sq Mean Sq F value Pr(>F)
> 
> group        1  0.740   0.740  0.2693 0.6092
> 
> Residuals   21 57.727   2.749               
> 
>  
> 
> Now I know from the book, hand calculations and SPSS that for "group", Sum Sq = Mean Sq = 1.12, not 0.740. Also from the standard anova:
> 
>  
> 
> > contrasts(infants$group) <- "contr.treatment"
> 
> > fit <- aov(age~group,data=infants)
> 
> > summary(fit)
> 
>             Df Sum Sq Mean Sq F value Pr(>F)
> 
> group        3 14.778   4.926  2.1422 0.1285
> 
> Residuals   19 43.690   2.299         
> 
>  
> 
> So I'd like to have the (correct) Mean Sq value divided by 2.299 and
> not 2.749, with 19 and not 21 df. Any advice on how to correctly use
> contrasts for subhypothesis testing, including where to find it in
> the manuals, would be much appreciated.

(The pervasive Zelazo dataset, eh? That's not a toy, it's a Science
paper. With methodological errors, even.) 

You're not comparing the same models. With a contrast specification
like that you're describing a model where group A and B are equal to
the intercept and C is as much above the intercept as D is below.

What you need to do is to compare two models, one in which C and D are
equal (and A and B are arbitrary) and one in which they are not. E.g.,
I have

> gr
 [1] active  active  active  active  active  active  passive passive passive
[10] passive passive passive none    none    none    none    none    none
[19] ctr.8w  ctr.8w  ctr.8w  ctr.8w  ctr.8w
Levels: active passive none ctr.8w

> gr2 <- gr ; levels(gr2) <- c("active","passive","ctr/non","ctr/non")

# (note that you need to be very careful that levels are specified in
#  correct order there...)

> model1 <- lm(age~gr)
> model2 <- lm(age~gr2)
> anova(model2,model1)
Analysis of Variance Table

Model 1: age ~ gr2
Model 2: age ~ gr
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     20 44.812
2     19 43.690  1     1.123 0.4883 0.4931

Or, a little more sneaky, use the fact that the anova table for a
model is cumulative (Type I, if you want to speak SAS-ish):

> anova(lm(age~gr2+gr))
Analysis of Variance Table

Response: age
          Df Sum Sq Mean Sq F value Pr(>F)
gr2        2 13.655   6.827  2.9692 0.0755 .
gr         1  1.123   1.123  0.4883 0.4931
Residuals 19 43.690   2.299
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907



More information about the R-help mailing list