[R] Type-I v/s Type-III Sum-Of-Squares in ANOVA
jfox at mcmaster.ca
Tue Mar 2 03:19:01 CET 2010
I always hesitate to address this question because it seems to generate much
more heat than light. I know that you've been told that the question has
been asked and answered on the list before, but simply to point to
unilluminating answers isn't helpful, I believe. It's also hard to answer
the question briefly, because a careful explanation takes more space than is
reasonable on an email list, so I'll just make a few general points:
(1) What's important in formulating tests are the hypotheses being tested.
These should be sensible and of interest. Take a crossed two-way ANOVA, for
example, for factors A and B. In that context, hypotheses should address the
pattern of cell means.
(2) The null hypothesis of no interaction is that the profiles of cell means
for A (say) across the levels of B are parallel. If these is no interaction,
then testing equality of any weighted average of the profiles of cell means
across the levels of B will test the null hypothesis of no A main effects.
The most powerful test for the A main effect is the type-II test: for A
after B ignoring the AB interaction; (and continuing) for B after A ignoring
AB; and for AB after A and B. These tests are independent of the contrasts
chosen to represent A and B, which is generally the case when one restricts
consideration to models that conform to the principle of marginality.
(3) If there is interaction, then what one means by main effects is
ambiguous, but one possibility is to formulate the main effects for A in
terms of the marginal means for A averaged across the levels of B. The null
hypothesis of no A main effects is then that the A marginal means are all
equal. This is the type-III test: for A after B and AB; (and continuing) for
B after A and AB; and AB after A and B. Because the tests for the main
effect violate the principle of marginality, to compute these tests properly
requires contrasts that are orthogonal in the row-basis of the model -- in
R, e.g., contr.sum or contr.helmert, but not the default contr.treatment.
The type-III tests have the attraction that they also test for main effects
if the interactions are absent, though they are not maximally powerful in
that circumstance. There's also a serious question about whether one would
be interested in main effects defined as averages over the level of the
other factor when interactions are present. If, however, interactions are
present, then the type-II tests for A and B are not tests of main effects in
a reasonable interpretation of that term.
(4) The type-I tests are sequential: for A ignoring B and their interaction;
for B after A and ignoring the interaction; and for AB after A and B. These
tests compare models that conform to marginality and thus are independent of
the contrasts selected to represent A and B. If A and B are related (as
happens when the cell counts are unequal) then the test for A does not test
the A main effect in any reasonable interpretation of that term -- i.e., as
the partial relationship between the response and A conditional on B.
(5) Other, similar, issues arise in models with factors and covariates.
These are not typically handled reasonably for type-III tests in software
such as SAS and SPSS, which, e.g., test for differences across the levels of
a factor A when a covariate X is 0.
As you suggest, the anova function in R produces type-I tests. The Anova
function in the car package produces type-II tests by default, and type-III
tests optionally. If you select the latter, then you must be careful to use,
say, contr.sum and not contr.treatment to encode the factors.
I know that there are objections to the use of the terms type-I, -II and
-III, but I find these an innocuous shorthand once the issues distinguishing
the tests are understood.
My preference is for type-II tests, which are hard to screw up because they
conform to the principle of marginality and are maximally powerful in the
context in which they are interesting.
I hope this helps,
Senator William McMaster
Professor of Social Statistics
Department of Sociology
Hamilton, Ontario, Canada
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> Behalf Of Ravi Kulkarni
> Sent: March-01-10 10:13 AM
> To: r-help at r-project.org
> Subject: [R] Type-I v/s Type-III Sum-Of-Squares in ANOVA
> I believe the aov() function in R uses a "Type-I sum-of-squares" by
> default as against "Type-III".
> This is relevant for me because I am trying to understand ANOVA in R
> my knowledge of ANOVA in SPSS. I can only reproduce the results of an
> done using R through SPSS if I specify that SPSS uses a Type-I
> sum-of-squares. (And yes, I know that when the sample sizes of all groups
> are equal, Type-I and Type-III produce the same answers.)
> My questions: 1) exactly what is the difference between the two types of
> 2) how can I make R use a Type-III s-o-s? Should I?
> must have some reason for using Type-I as default rather than Type-III.
> (Given a choice, believe R!)
> A reference (stats book, URL...) would be helpful...
> View this message in context:
> Sent from the R help mailing list archive at Nabble.com.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help