# [R] type III Sum Sq in ANOVA table - Howto?

Liaw, Andy andy_liaw at merck.com
Fri Mar 7 02:08:23 CET 2003

```> From: Rolf Turner [mailto:rolf at math.unb.ca]
>
> Andy Liaw wrote:
>
> >  (i.e., what hypotheses you really want to test, and test
> only those).  The
> >  model hierarchy says that a model should not have an
> interaction term
> >  involving a factor whose main effect is not present in the
> model.  Seen in
> >  this light, the hypothesis you're trying to test involves
> a non-sensical
> >  model.
>
> Not really.  The hypothesis being tested by Type III sums of square
> may be suspected of not being of ``central interest'', but it is NOT
> (as is commonly believed) ``non-sensical''.
>
> Let us think about the 2-way ANOVA case, where one can actually
> understand what is going on.  Let the population ***cell means*** be
> mu_ij (i = 1, ..., m, j = 1, ..., n) and forget about the confusing
>
> Testing for the significance of the ``row factor'' by Type III
> sums of squares (with interaction in the model of course) tests
>
> 	H_0: mu_{1.}-bar = mu_{2.}-bar = ... = mu_{m.}-bar
>
> I.e. that the means of the population cell means, over columns, are
> all equal.  I.e. that ``when rows are averaged over columns'' there
> is no row effect.
>
> This could, at least conceiveably, be of interest.  Note that the
> average is not a weighted average, saying that all columns are
> equally important.  If all columns are NOT equally important (e.g.
> if an item randomly drawn from the population is more likely to
> ``come from'' column 1 than from column 2 etc.) then this hypothesis
> is less likely to be of interest.
>
> But it isn't nonsensical.
>
> It is true, however, that most of the time when people test things
> using Type III sums of squares they don't understand what they are
> really testing.  But then (said he cynically) people don't understand
> what the hell they are really testing in most situations, not just
> in the context of Type III sums of squares.
>
> 				cheers,
>
> 					Rolf Turner

I'm sorry, but I still don't see sense of this argument.  By including the
interaction term in the model, isn't it implied that the cells have
different means, and the structure isn't a simple row + column?  Assuming
that being the case, what's the sense of "averaging" over columns (or rows)?
I can perhaps understand the utility of such "test" in an exploratory
setting, but fail to see how this can be valid test in a more rigorous
sense.  Maybe I'm stuck too deep in the rut...

Cheers,
Andy

------------------------------------------------------------------------------

```