[R] Marginal (type II) SS for powers of continuous variables in a linear model?

Tue Aug 12 14:01:01 CEST 2003

On Tue, 12 Aug 2003, [iso-8859-1] Bjørn-Helge Mevik wrote:

> Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>
> > drop1 is the part of R that does type II sum of squares, and it works in
> > your example.  So does Anova in the current car:
>
> I'm sorry, I should have included an example to clarify what I meant
> (or point out my misunderstandings :-).  I'll do that below, but first
> a comment:
>
> > And in summary.aov() those *are* marginal SS, as balance is assumed
> > for aov models. (That is not to say the software does not work otherwise,
> > but the interpretability depends on balance.)
>
> Maybe I've misunderstood, but in the documentation for aov, it says
> (under Details):
>      This provides a wrapper to `lm' for fitting linear models to
>      balanced or unbalanced experimental designs.
>
> Also, is this example (lm(y~x+I(x^2), Df)) really balanced?  I think

No, and I did not use summary,aov on it!

> of balance as the property that there is an equal number of
> observations for every combination of the factors.  With x and x^2,
> this doesn't happen.  For instance, x=1 and x^2=1 occurs once, but x=1
> and x^2=4 never occurs (naturally).  Or have I misunderstood something?

Yes. summary.aov(split=) and model.tables and the like are designed for
balanced data.  They may or may not work in the unbalanced case.  The
comment you quote is for aov(), not those other functions.

> Now, the example:
>
> > Df2 <- expand.grid (A=factor(1:2), B=factor(1:2), x=1:5)
> > Df2$y <- codes(Df2$A) + 2*codes(Df2$B) + 0.05*codes(Df2$A)*codes(Df2$B) +
> +   Df2$x + 0.1*Df2$x^2 + 0.1*(0:4)
> > Df2 <- Df2[-1,]    # Remove one observation to make it unbalanced

codes is deprecated!

> > ABx2.lm <- lm(y~A*B + x + I(x^2), data=Df2)
>
> The SSs I call marginal are R(A | B, x, x^2), R(B | A, x, x^2),
> R(A:B | A, B, x, x^2), R(x | A, B, A:B) and R(x^2 | A, B, A:B, x).

That's not what most other people call marginal, though.

> (Here, for instance, R(x | A, B, A:B) means the reduction of SSE due
> to including x in a model when A, B and A:B (and the mean) are already
> in the model. I've omitted the mean from the notation.)

> > anova(ABx2.lm)
> Analysis of Variance Table
>
> Response: y
>           Df Sum Sq Mean Sq   F value    Pr(>F)
> A          1  1.737   1.737   66.5700 1.801e-06 ***
> B          1 13.647  13.647  523.0292 6.953e-12 ***
> x          1 93.677  93.677 3590.1703 < 2.2e-16 ***
> I(x^2)     1  0.583   0.583   22.3302 0.0003966 ***
> A:B        1  0.011   0.011    0.4238 0.5263772
> Residuals 13  0.339   0.026
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> This gives SSs on the form R(A), R(B | A), R(x | A, B) etc.  (If the
> design had been balanced (in A, B and x), this would have been the
> same as the marginal SSs above.)
>
> > drop1(ABx2.lm)
> Single term deletions
>
> Model:
> y ~ A * B + x + I(x^2)
>        Df Sum of Sq     RSS     AIC
> <none>                0.339 -64.486
> x       1     1.188   1.527 -37.901
> I(x^2)  1     0.592   0.931 -47.294
> A:B     1     0.011   0.350 -65.877
>
> This gives the SSs R(x | A, B, A:B, x^2), R(x^2 | A, B, A:B, x) and
> R(A:B | A, B, x, x^2).  The SS for x is not marginal as defined
> above.

But that *is* how `marginal' is usually defined.  Why should I(x^2) be
regarded as subservient to x?  It is just another function of x.  Suppose
we have x and log(x)?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595