[R] Marginal (type II) SS for powers of continuous variables in a linear model?
Prof Brian D Ripley
ripley at stats.ox.ac.uk
Tue Aug 12 14:01:01 CEST 2003
On Tue, 12 Aug 2003, [iso-8859-1] Bjørn-Helge Mevik wrote:
> Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>
> > drop1 is the part of R that does type II sum of squares, and it works in
> > your example. So does Anova in the current car:
>
> I'm sorry, I should have included an example to clarify what I meant
> (or point out my misunderstandings :-). I'll do that below, but first
> a comment:
>
> > And in summary.aov() those *are* marginal SS, as balance is assumed
> > for aov models. (That is not to say the software does not work otherwise,
> > but the interpretability depends on balance.)
>
> Maybe I've misunderstood, but in the documentation for aov, it says
> (under Details):
> This provides a wrapper to `lm' for fitting linear models to
> balanced or unbalanced experimental designs.
>
> Also, is this example (lm(y~x+I(x^2), Df)) really balanced? I think
No, and I did not use summary,aov on it!
> of balance as the property that there is an equal number of
> observations for every combination of the factors. With x and x^2,
> this doesn't happen. For instance, x=1 and x^2=1 occurs once, but x=1
> and x^2=4 never occurs (naturally). Or have I misunderstood something?
Yes. summary.aov(split=) and model.tables and the like are designed for
balanced data. They may or may not work in the unbalanced case. The
comment you quote is for aov(), not those other functions.
> Now, the example:
>
> > Df2 <- expand.grid (A=factor(1:2), B=factor(1:2), x=1:5)
> > Df2$y <- codes(Df2$A) + 2*codes(Df2$B) + 0.05*codes(Df2$A)*codes(Df2$B) +
> + Df2$x + 0.1*Df2$x^2 + 0.1*(0:4)
> > Df2 <- Df2[-1,] # Remove one observation to make it unbalanced
codes is deprecated!
> > ABx2.lm <- lm(y~A*B + x + I(x^2), data=Df2)
>
> The SSs I call marginal are R(A | B, x, x^2), R(B | A, x, x^2),
> R(A:B | A, B, x, x^2), R(x | A, B, A:B) and R(x^2 | A, B, A:B, x).
That's not what most other people call marginal, though.
> (Here, for instance, R(x | A, B, A:B) means the reduction of SSE due
> to including x in a model when A, B and A:B (and the mean) are already
> in the model. I've omitted the mean from the notation.)
> > anova(ABx2.lm)
> Analysis of Variance Table
>
> Response: y
> Df Sum Sq Mean Sq F value Pr(>F)
> A 1 1.737 1.737 66.5700 1.801e-06 ***
> B 1 13.647 13.647 523.0292 6.953e-12 ***
> x 1 93.677 93.677 3590.1703 < 2.2e-16 ***
> I(x^2) 1 0.583 0.583 22.3302 0.0003966 ***
> A:B 1 0.011 0.011 0.4238 0.5263772
> Residuals 13 0.339 0.026
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> This gives SSs on the form R(A), R(B | A), R(x | A, B) etc. (If the
> design had been balanced (in A, B and x), this would have been the
> same as the marginal SSs above.)
>
> > drop1(ABx2.lm)
> Single term deletions
>
> Model:
> y ~ A * B + x + I(x^2)
> Df Sum of Sq RSS AIC
> <none> 0.339 -64.486
> x 1 1.188 1.527 -37.901
> I(x^2) 1 0.592 0.931 -47.294
> A:B 1 0.011 0.350 -65.877
>
> This gives the SSs R(x | A, B, A:B, x^2), R(x^2 | A, B, A:B, x) and
> R(A:B | A, B, x, x^2). The SS for x is not marginal as defined
> above.
But that *is* how `marginal' is usually defined. Why should I(x^2) be
regarded as subservient to x? It is just another function of x. Suppose
we have x and log(x)?
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list