[R] Anova - adjusted or sequential sums of squares?

Wed Apr 20 16:06:32 CEST 2005

michael watson (IAH-C) wrote:
> Hi
> 
> I am performing an analysis of variance with two factors, each with two
> levels.  I have differing numbers of observations in each of the four
> combinations, but all four combinations *are* present (2 of the factor
> combinations have 3 observations, 1 has 4 and 1 has 5)
> 
> I have used both anova(aov(...)) and anova(lm(...)) in R and it gave the
> same result - as expected.  I then plugged this into minitab, performed
> what minitab called a General Linear Model (I have to use this in
> minitab as I have an unbalanced data set) and got a different result.
> After a little mining this is because minitab, by default, uses the type
> III adjusted SS.  Sure enough, if I changed minitab to use the type I
> sequential SS, I get exactly the same results as aov() and lm() in R.  
> 
> So which should I use?  Type I adjusted SS or Type III sequential SS?
> Minitab help tells me that I would "usually" want to use type III
> adjusted SS, as  type I sequential "sums of squares can differ when your
> design is unbalanced" - which mine is.  The R functions I am using are
> clearly using the type I sequential SS.

Install the fortunes package and try
 > fortune("Venables")

I'm really curious to know why the "two types" of sum of squares are called
"Type I" and "Type III"! This is a very common misconception, particularly
among SAS users who have been fed this nonsense quite often for all their
professional lives. Fortunately the reality is much simpler. There is, 
by any
sensible reckoning, only ONE type of sum of squares, and it always 
represents
an improvement sum of squares of the outer (or alternative) model over the
inner (or null hypothesis) model. What the SAS highly dubious 
classification of
sums of squares does is to encourage users to concentrate on the null
hypothesis model and to forget about the alternative. This is always a 
very bad
idea and not surprisingly it can lead to nonsensical tests, as in the 
test it
provides for main effects "even in the presence of interactions", something
which beggars definition, let alone belief.
    -- Bill Venables
       R-help (November 2000)

In the words of the master, "there is ... only one type of sum of 
squares", which is the one that R reports.  The others are awkward 
fictions created for times when one could only afford to fit one or two 
linear models per week and therefore wanted the output to give results 
for all possible tests one could conceive, even if the models being 
tested didn't make sense.