# [R] anova subhypotheses

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Thu Mar 6 12:30:01 CET 2003

```"rob foxall (IFR)" <rob.foxall at BBSRC.AC.UK> writes:

> Hello all,
>
>             A really noddy question for you all: I'm trying without success to do some subhypothesis testing. Using simple anova model, with a toy dataset from a book. I have four factors A,B,C,D, and wish to test mu_C = mu_D. This is what I have tried:
>
>
>
> > contrasts(infants\$group,how.many=1) <- c(0,0,1,-1)
>
> > contrasts(infants\$group)
>
>   [,1]
>
> A    0
>
> B    0
>
> C    1
>
> D   -1
>
> > fit <- aov(age~group,data=infants)
>
> > summary(fit)
>
>             Df Sum Sq Mean Sq F value Pr(>F)
>
> group        1  0.740   0.740  0.2693 0.6092
>
> Residuals   21 57.727   2.749
>
>
>
> Now I know from the book, hand calculations and SPSS that for "group", Sum Sq = Mean Sq = 1.12, not 0.740. Also from the standard anova:
>
>
>
> > contrasts(infants\$group) <- "contr.treatment"
>
> > fit <- aov(age~group,data=infants)
>
> > summary(fit)
>
>             Df Sum Sq Mean Sq F value Pr(>F)
>
> group        3 14.778   4.926  2.1422 0.1285
>
> Residuals   19 43.690   2.299
>
>
>
> So I'd like to have the (correct) Mean Sq value divided by 2.299 and
> not 2.749, with 19 and not 21 df. Any advice on how to correctly use
> contrasts for subhypothesis testing, including where to find it in
> the manuals, would be much appreciated.

(The pervasive Zelazo dataset, eh? That's not a toy, it's a Science
paper. With methodological errors, even.)

You're not comparing the same models. With a contrast specification
like that you're describing a model where group A and B are equal to
the intercept and C is as much above the intercept as D is below.

What you need to do is to compare two models, one in which C and D are
equal (and A and B are arbitrary) and one in which they are not. E.g.,
I have

> gr
[1] active  active  active  active  active  active  passive passive passive
[10] passive passive passive none    none    none    none    none    none
[19] ctr.8w  ctr.8w  ctr.8w  ctr.8w  ctr.8w
Levels: active passive none ctr.8w

> gr2 <- gr ; levels(gr2) <- c("active","passive","ctr/non","ctr/non")

# (note that you need to be very careful that levels are specified in
#  correct order there...)

> model1 <- lm(age~gr)
> model2 <- lm(age~gr2)
> anova(model2,model1)
Analysis of Variance Table

Model 1: age ~ gr2
Model 2: age ~ gr
Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     20 44.812
2     19 43.690  1     1.123 0.4883 0.4931

Or, a little more sneaky, use the fact that the anova table for a
model is cumulative (Type I, if you want to speak SAS-ish):

> anova(lm(age~gr2+gr))
Analysis of Variance Table

Response: age
Df Sum Sq Mean Sq F value Pr(>F)
gr2        2 13.655   6.827  2.9692 0.0755 .
gr         1  1.123   1.123  0.4883 0.4931
Residuals 19 43.690   2.299
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

--
O__  ---- Peter Dalgaard             Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
(*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

```