[R] Type II and III sum of square in Anova (R, car package)

John Fox jfox at mcmaster.ca
Tue Aug 29 00:07:37 CEST 2006


Dear Amasco,

Again, I'll answer briefly (since the written source that I previously
mentioned has an extensive discussion):

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Amasco 
> Miralisus
> Sent: Monday, August 28, 2006 2:21 PM
> To: r-help at stat.math.ethz.ch
> Cc: John Fox; Prof Brian Ripley; Mark Lyman
> Subject: Re: [R] Type II and III sum of square in Anova (R, 
> car package)
> 
> Hello,
> 
> First of all, I would like to thank everybody who answered my 
> question. Every post has added something to my knowledge of the topic.
> I now know why Type III SS are so questionable.
> 
> As I understood form R FAQ, there is disagreement among 
> Statisticians which SS to use 
> (http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-out
> put-from-anova_0028_0029-depend-on-the-order-of-factors-in-the
> -model_003f).
> However, most commercial statistical packages use Type III as 
> the default (with orthogonal contrasts), just as STATISTICA, 
> from which I am currently trying to migrate to R. This was 
> probably was done for the convenience of end-users who are 
> not very experienced in theoretical statistics.
> 

Note that the contrasts are only orthogonal in the row basis of the model
matrix, not, with unbalanced data, in the model matrix itself.

> I am aware that the same result could be produced using the standard
> anova() function with Type I "sequential" SS, supplemented by 
> drop1() function, but this approach will look quite 
> complicated for persons without any substantial background in 
> statistics, like no-math students. I would prefer easier way, 
> possibly more universal, though also probably more "for 
> dummies" :) If am not mistaken, car package by John Fox with 
> his nice Anova() function is the reasonable alternative for 
> any, who wish to simply perform quick statistical analysis, 
> without afraid to mess something with model fitting. Of 
> course orthogonal contrasts have to be specified (for example 
> contr.sum) in case of Type III SS.
> 
> Therefore, I would like to reformulate my questions, to make 
> it easier for you to answer:
> 
> 1. The first question related to answer by Professor Brian 
> Ripley: Did I understood correctly from the advised paper 
> (Bill Venables'
> 'exegeses' paper) that there is not much sense to test main 
> effects if the interaction is significant?
> 

Many are of this opinion. I would put it a bit differently: Properly
formulated, tests of main effects in the presence of interactions make sense
(i.e., have a straightforward interpretation in terms of population marginal
means) but probably are not of interest.

> 2. If I understood the post by John Fox correctly, I could safely use
> Anova(.,type="III") function from car for ANOVA analyses in 
> R, both for balanced and unbalanced designs? Of course 
> providing the model was fitted with orthogonal contrasts. 
> Something like below:
> mod <- aov(response ~ factor1 * factor2, data=mydata,
>                    contrasts=list(factor1=contr.sum, 
> factor2=contr.sum)) Anova(mod, type="III")
> 

Yes (or you could reset the contrasts option), but why do you appear to
prefer the "type-III" tests to the "type-II" tests?

> It was also said in most of your posts that the decision of 
> which of Type of SS to use has to be done on the basis of the 
> hypothesis we want to test. Therefore, let's assume that I 
> would like to test the significance of both factors, and if 
> some of them significant, I plan to use post-hoc tests to 
> explore difference(s) between levels of this significant factor(s).
> 

Your statement is too vague to imply what kind of tests you should use. I
think that people are almost always interested in "main effects" when
interactions to which they are marginal are negligible. In this situation,
both "type-II" and "type-III" tests are appropriate, and "type-II" tests
would usually be more powerful.

Regards,
John

> Thank you in advance, Amasco
> 
> On 8/27/06, John Fox <jfox at mcmaster.ca> wrote:
> > Dear Amasco,
> >
> > A complete explanation of the issues that you raise is 
> awkward in an 
> > email, so I'll address your questions briefly. Section 8.2 
> of my text, 
> > Applied Regression Analysis, Linear Models, and Related 
> Methods (Sage, 
> > 1997) has a detailed discussion.
> >
> > (1) In balanced designs, so-called "Type I," "II," and 
> "III" sums of 
> > squares are identical. If the STATA manual says that Type 
> II tests are 
> > only appropriate in balanced designs, then that doesn't 
> make a whole 
> > lot of sense (unless one believes that Type-II tests are nonsense, 
> > which is not the case).
> >
> > (2) One should concentrate not directly on different 
> "types" of sums 
> > of squares, but on the hypotheses to be tested. Sums of squares and 
> > F-tests should follow from the hypotheses. Type-II and 
> Type-III tests 
> > (if the latter are properly formulated) test hypotheses that are 
> > reasonably construed as tests of main effects and interactions in 
> > unbalanced designs. In unbalanced designs, Type-I sums of squares 
> > usually test hypotheses of interest only by accident.
> >
> > (3) Type-II sums of squares are constructed obeying the 
> principle of 
> > marginality, so the kinds of contrasts employed to 
> represent factors 
> > are irrelevant to the sums of squares produced. You get the same 
> > answer for any full set of contrasts for each factor. In 
> general, the 
> > hypotheses tested assume that terms to which a particular term is 
> > marginal are zero. So, for example, in a three-way ANOVA 
> with factors 
> > A, B, and C, the Type-II test for the AB interaction 
> assumes that the 
> > ABC interaction is absent, and the test for the A main 
> effect assumes 
> > that the ABC, AB, and AC interaction are absent (but not 
> necessarily 
> > the BC interaction, since the A main effect is not marginal to this 
> > term). A general justification is that we're usually not 
> interested, 
> > e.g., in a main effect that's marginal to a nonzero interaction.
> >
> > (4) Type-III tests do not assume that terms higher-order to 
> the term 
> > in question are zero. For example, in a two-way design with 
> factors A 
> > and B, the type-III test for the A main effect tests whether the 
> > population marginal means at the levels of A (i.e., averaged across 
> > the levels of B) are the same. One can test this hypothesis 
> whether or 
> > not A and B interact, since the marginal means can be 
> formed whether 
> > or not the profiles of means for A within levels of B are parallel. 
> > Whether the hypothesis is of interest in the presence of 
> interaction 
> > is another matter, however. To compute Type-III tests using 
> > incremental F-tests, one needs contrasts that are orthogonal in the 
> > row-basis of the model matrix. In R, this means, e.g., using 
> > contr.sum, contr.helmert, or contr.poly (all of which will give you 
> > the same SS), but not contr.treatment. Failing to be 
> careful here will 
> > result in testing hypotheses that are not reasonably 
> construed, e.g., as hypotheses concerning main effects.
> >
> > (5) The same considerations apply to linear models that include 
> > quantitative predictors -- e.g., ANCOVA. Most software will not 
> > automatically produce sensible Type-III tests, however.
> >
> > I hope this helps,
> >  John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch 
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Amasco 
> > > Miralisus
> > > Sent: Saturday, August 26, 2006 5:07 PM
> > > To: r-help at stat.math.ethz.ch
> > > Subject: [R] Type II and III sum of square in Anova (R, 
> car package)
> > >
> > > Hello everybody,
> > >
> > > I have some questions on ANOVA in general and on ANOVA in R 
> > > particularly.
> > > I am not Statistician, therefore I would be very 
> appreciated if you 
> > > answer it in a simple way.
> > >
> > > 1. First of all, more general question. Standard anova() function 
> > > for lm() or aov() models in R implements Type I sum of squares 
> > > (sequential), which is not well suited for unbalanced ANOVA. 
> > > Therefore it is better to use
> > > Anova() function from car package, which was programmed 
> by John Fox 
> > > to use Type II and Type III sum of squares. Did I get the point?
> > >
> > > 2. Now more specific question. Type II sum of squares is not well 
> > > suited for unbalanced ANOVA designs too (as stated in STATISTICA 
> > > help), therefore the general rule of thumb is to use Anova() 
> > > function using Type II SS only for balanced ANOVA and Anova() 
> > > function using Type III SS for unbalanced ANOVA?
> > > Is this correct interpretation?
> > >
> > > 3. I have found a post from John Fox in which he wrote 
> that Type III 
> > > SS could be misleading in case someone use some 
> contrasts. What is 
> > > this about?
> > > Could you please advice, when it is appropriate to use 
> Type II and 
> > > when Type III SS? I do not use contrasts for comparisons, just 
> > > general ANOVA with subsequent Tukey post-hoc comparisons.
> > >
> > > Thank you in advance,
> > > Amasco
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list 
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list