[R] Anova - adjusted or sequential sums of squares?
Thomas Lumley
tlumley at u.washington.edu
Wed Apr 20 17:39:31 CEST 2005
On Wed, 20 Apr 2005, michael watson (IAH-C) wrote:
> I guess the real problem is this:
>
> As I have a different number of observations in each of the groups, the
> results *change* depending on which order I specify the factors in the
> model. This unnerves me. With a completely balanced design, this
> doesn't happen - the results are the same no matter which order I
> specify the factors.
>
> It's this reason that I have been given for using the so-called type III
> adjusted sums of squares...
>
This is one of many examples of an attempt to provide a mathematical
answer to something that isn't a mathematical question.
As people have already pointed out, in any practical testing situation you
have two models you want to compare. If you are working in an interactive
statistical environment, or even in a modern batch-mode system, you can
fit the two models and compare them. If you want to compare two other
models, you can fit them and compare them.
However, in the Bad Old Days this was inconvenient (or so I'm told). If
you had half a dozen tests, and one of the models was the same in each
test, it was a substantial saving of time and effort to fit this model
just once.
This led to a system where you specify a model and a set of tests: eg I'm
going to fit y~a+b+c+d and I want to test (some of) y~a vs y~a+b, y~a+b vs
y~a+b+c and so on. Or, I want to test (some of) y~a+b+c vs y~a+b+c+d,
y~a+b+d vs y~a+b+c+d and so on. This gives the "Types" of sums of squares,
which are ways of specifying sets of tests. You could pick the "Type" so
that the total number of linear models you had to fit was minimized. As
these are merely a computational optimization, they don't have to make any
real sense. Unfortunately, as with many optimizations, they have gained a
life of their own.
The "Type III" sums of squares are the same regardless of order, but this
is a bad property, not a good one. The question you are asking when
you test "for" a term X really does depend on what other terms are in the
model, so order really does matter. However, since you can do anything
just by specifying two models and comparing them, you don't actually need
to worry about any of this.
-thomas
More information about the R-help
mailing list