[R] Anova - adjusted or sequential sums of squares?

Thomas Lumley tlumley at u.washington.edu
Wed Apr 20 17:39:31 CEST 2005

On Wed, 20 Apr 2005, michael watson (IAH-C) wrote:

> I guess the real problem is this:
> As I have a different number of observations in each of the groups, the
> results *change* depending on which order I specify the factors in the
> model.  This unnerves me.  With a completely balanced design, this
> doesn't happen - the results are the same no matter which order I
> specify the factors.
> It's this reason that I have been given for using the so-called type III
> adjusted sums of squares...

This is one of many examples of an attempt to provide a mathematical 
answer to something that isn't a mathematical question.

As people have already pointed out, in any practical testing situation you 
have two models you want to compare.  If you are working in an interactive 
statistical environment, or even in a modern batch-mode system, you can 
fit the two models and compare them.  If you want to compare two other 
models, you can fit them and compare them.

However, in the Bad Old Days this was inconvenient (or so I'm told).  If 
you had half a dozen tests, and one of the models was the same in each 
test, it was a substantial saving of time and effort to fit this model 
just once.

This led to a system where you specify a model and a set of tests: eg I'm 
going to fit y~a+b+c+d and I want to test (some of) y~a vs y~a+b, y~a+b vs 
y~a+b+c and so on. Or, I want to test (some of) y~a+b+c vs y~a+b+c+d, 
y~a+b+d vs y~a+b+c+d and so on. This gives the "Types" of sums of squares, 
which are ways of specifying sets of tests. You could pick the "Type" so 
that the total number of linear models you had to fit was minimized. As 
these are merely a computational optimization, they don't have to make any 
real sense. Unfortunately, as with many optimizations, they have gained a 
life of their own.

The "Type III" sums of squares are the same regardless of order, but this 
is a bad property, not a good one. The question you are asking when 
you test "for" a term X really does depend on what other terms are in the 
model, so order really does matter.  However, since you can do anything 
just by specifying two models and comparing them, you don't actually need 
to worry about any of this.


More information about the R-help mailing list