[R] Anova and unbalanced designs

Sun Jan 25 02:08:42 CET 2009

Dear Peter and Nils,

In my initial message, I stated misleadingly that the contrast coding didn't
matter for the "type-III" tests here since there is just one
between-subjects factor, but that's not right: The between type-III SS is
correct using contr.treatment(), but the within SS is not. As is generally
the case, to get reasonable type-III tests (i.e., tests of reasonable
hypotheses), it's necessary to have contrasts that are orthogonal in the
row-basis of the design, such as contr.sum(),  contr.helmert(), or
contr.poly(). The "type-II" tests, however, are insensitive to the contrast
parametrization. Anova() always uses an orthogonal parametrization for the
within-subjects design.

The general advice in ?Anova is, "Be very careful in formulating the model
for type-III tests, or the hypotheses tested will not make sense."

Thanks, Peter, for pointing this out.

John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
> Sent: January-24-09 6:31 PM
> To: Nils Skotara
> Cc: John Fox; r-help at r-project.org; 'Michael Friendly'
> Subject: Re: [R] Anova and unbalanced designs
> 
> Nils Skotara wrote:
> > Dear John,
> >
> > thank you again! You replicated the type III result I got in SPSS! When
I
> > calculate Anova() type II:
> >
> > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> >
> >                     SS num Df Error SS den Df      F  Pr(>F)
> > between         4.8000      1   9.0000      8 4.2667 0.07273 .
> > within          0.2000      1  10.6667      8 0.1500 0.70864
> > between:within  2.1333      1  10.6667      8 1.6000 0.24150
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > I see the exact same values as you had written.
> > However, and now I am really lost, type III (I did not change anything
> else)
> > leads to the following:
> >
> > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> >
> >                               SS num Df Error SS den Df       F
Pr(>F)
> > (Intercept)               72.000      1    9.000      8 64.0000
4.367e-05
> ***
> > between                    4.800      1    9.000      8  4.2667
0.07273 .
> > as.factor(within)          2.000      1   10.667      8  1.5000
0.25551
> > between:as.factor(within)  2.133      1   10.667      8  1.6000
0.24150
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > How is this possible?
> 
> This looks like a contrast parametrization issue: If we look at the
> per-group mean within-differences and their SE, we get
> 
>  > summary(lm(within1-within2~between - 1))
> ..
> Coefficients:
>           Estimate Std. Error t value Pr(>|t|)
> between1  -1.0000     0.8165  -1.225    0.256
> between2   0.3333     0.6667   0.500    0.631
> ..
>  > table(between)
> between
> 1 2
> 4 6
> 
> Now, the type II F test is based on weighting the two means as you would
> after testing for no interaction
> 
>  > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
> [1] 0.1500205
> 
> and type III is to weight them as if there had been equal counts
> 
>  > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
> [1] 0.400022
> 
> However, the result above corresponds to looking at group1 only
> 
>  > (-1)^2/(0.8165^2)
> [1] 1.499987
> 
> It helps if you choose orhtogonal contrast parametrizations:
> 
>  > options(contrasts=c("contr.sum","contr.helmert"))
>  > betweenanova <- lm(values ~ between)> Anova(betweenanova, idata=with,
> idesign= ~as.factor(within), type = "III" )
> 
> Type III Repeated Measures MANOVA Tests: Pillai test statistic
>                            Df test stat approx F num Df den Df    Pr(>F)
> (Intercept)                1     0.963  209.067      1      8 5.121e-07
***
> between                    1     0.348    4.267      1      8   0.07273 .
> as.factor(within)          1     0.048    0.400      1      8   0.54474
> between:as.factor(within)  1     0.167    1.600      1      8   0.24150
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> 
> 
> 
> --
>     O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>   (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907