[R] Anova - adjusted or sequential sums of squares?

Wed Apr 20 16:37:50 CEST 2005

I guess the real problem is this:

As I have a different number of observations in each of the groups, the
results *change* depending on which order I specify the factors in the
model.  This unnerves me.  With a completely balanced design, this
doesn't happen - the results are the same no matter which order I
specify the factors.  

It's this reason that I have been given for using the so-called type III
adjusted sums of squares...

Mick

-----Original Message-----
From: Douglas Bates [mailto:bates at stat.wisc.edu] 
Sent: 20 April 2005 15:07
To: michael watson (IAH-C)
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Anova - adjusted or sequential sums of squares?

michael watson (IAH-C) wrote:
> Hi
> 
> I am performing an analysis of variance with two factors, each with 
> two levels.  I have differing numbers of observations in each of the 
> four combinations, but all four combinations *are* present (2 of the 
> factor combinations have 3 observations, 1 has 4 and 1 has 5)
> 
> I have used both anova(aov(...)) and anova(lm(...)) in R and it gave 
> the same result - as expected.  I then plugged this into minitab, 
> performed what minitab called a General Linear Model (I have to use 
> this in minitab as I have an unbalanced data set) and got a different 
> result. After a little mining this is because minitab, by default, 
> uses the type III adjusted SS.  Sure enough, if I changed minitab to 
> use the type I sequential SS, I get exactly the same results as aov()
and lm() in R.
> 
> So which should I use?  Type I adjusted SS or Type III sequential SS? 
> Minitab help tells me that I would "usually" want to use type III 
> adjusted SS, as  type I sequential "sums of squares can differ when 
> your design is unbalanced" - which mine is.  The R functions I am 
> using are clearly using the type I sequential SS.

Install the fortunes package and try
 > fortune("Venables")

I'm really curious to know why the "two types" of sum of squares are
called "Type I" and "Type III"! This is a very common misconception,
particularly among SAS users who have been fed this nonsense quite often
for all their professional lives. Fortunately the reality is much
simpler. There is, 
by any
sensible reckoning, only ONE type of sum of squares, and it always 
represents
an improvement sum of squares of the outer (or alternative) model over
the inner (or null hypothesis) model. What the SAS highly dubious 
classification of
sums of squares does is to encourage users to concentrate on the null
hypothesis model and to forget about the alternative. This is always a 
very bad
idea and not surprisingly it can lead to nonsensical tests, as in the 
test it
provides for main effects "even in the presence of interactions",
something which beggars definition, let alone belief.
    -- Bill Venables
       R-help (November 2000)

In the words of the master, "there is ... only one type of sum of 
squares", which is the one that R reports.  The others are awkward 
fictions created for times when one could only afford to fit one or two 
linear models per week and therefore wanted the output to give results 
for all possible tests one could conceive, even if the models being 
tested didn't make sense.