[R] Marginal (type II) SS for powers of continuous variables in a linear model?
Bjørn-Helge Mevik
bhx2 at mevik.net
Tue Aug 12 13:14:07 CEST 2003
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
> drop1 is the part of R that does type II sum of squares, and it works in
> your example. So does Anova in the current car:
I'm sorry, I should have included an example to clarify what I meant
(or point out my misunderstandings :-). I'll do that below, but first
a comment:
> And in summary.aov() those *are* marginal SS, as balance is assumed
> for aov models. (That is not to say the software does not work otherwise,
> but the interpretability depends on balance.)
Maybe I've misunderstood, but in the documentation for aov, it says
(under Details):
This provides a wrapper to `lm' for fitting linear models to
balanced or unbalanced experimental designs.
Also, is this example (lm(y~x+I(x^2), Df)) really balanced? I think
of balance as the property that there is an equal number of
observations for every combination of the factors. With x and x^2,
this doesn't happen. For instance, x=1 and x^2=1 occurs once, but x=1
and x^2=4 never occurs (naturally). Or have I misunderstood something?
Now, the example:
> Df2 <- expand.grid (A=factor(1:2), B=factor(1:2), x=1:5)
> Df2$y <- codes(Df2$A) + 2*codes(Df2$B) + 0.05*codes(Df2$A)*codes(Df2$B) +
+ Df2$x + 0.1*Df2$x^2 + 0.1*(0:4)
> Df2 <- Df2[-1,] # Remove one observation to make it unbalanced
> ABx2.lm <- lm(y~A*B + x + I(x^2), data=Df2)
The SSs I call marginal are R(A | B, x, x^2), R(B | A, x, x^2),
R(A:B | A, B, x, x^2), R(x | A, B, A:B) and R(x^2 | A, B, A:B, x).
(Here, for instance, R(x | A, B, A:B) means the reduction of SSE due
to including x in a model when A, B and A:B (and the mean) are already
in the model. I've omitted the mean from the notation.)
> anova(ABx2.lm)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1.737 1.737 66.5700 1.801e-06 ***
B 1 13.647 13.647 523.0292 6.953e-12 ***
x 1 93.677 93.677 3590.1703 < 2.2e-16 ***
I(x^2) 1 0.583 0.583 22.3302 0.0003966 ***
A:B 1 0.011 0.011 0.4238 0.5263772
Residuals 13 0.339 0.026
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
This gives SSs on the form R(A), R(B | A), R(x | A, B) etc. (If the
design had been balanced (in A, B and x), this would have been the
same as the marginal SSs above.)
> drop1(ABx2.lm)
Single term deletions
Model:
y ~ A * B + x + I(x^2)
Df Sum of Sq RSS AIC
<none> 0.339 -64.486
x 1 1.188 1.527 -37.901
I(x^2) 1 0.592 0.931 -47.294
A:B 1 0.011 0.350 -65.877
This gives the SSs R(x | A, B, A:B, x^2), R(x^2 | A, B, A:B, x) and
R(A:B | A, B, x, x^2). The SS for x is not marginal as defined
above.
> library (car)
> Anova(ABx2.lm)
Anova Table (Type II tests)
Response: y
Sum Sq Df F value Pr(>F)
A 5.1806 1 198.5470 2.979e-09 ***
B 19.6610 1 753.5074 6.778e-13 ***
x 1.1879 1 45.5245 1.368e-05 ***
I(x^2) 0.5922 1 22.6970 0.0003699 ***
A:B 0.0111 1 0.4238 0.5263772
Residuals 0.3392 13
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
This gives marginal SSs for A, B, x^2 and A:B, but as with drop1, the
SS for x is R(x | A, B, A:B, x^2).
The only way I've figured out to give the `correct' SS for x, i.e.,
R(x | A, B, A:B), is:
> AB.lm <- lm(y~A*B, data=Df2)
> ABx.lm <- lm(y~A*B + x, data=Df2)
> anova (AB.lm, ABx.lm, ABx2.lm)
Analysis of Variance Table
Model 1: y ~ A * B
Model 2: y ~ A * B + x
Model 3: y ~ A * B + x + I(x^2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 15 93.760
2 14 0.931 1 92.829 3557.651 < 2.2e-16 ***
3 13 0.339 1 0.592 22.697 0.0003699 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(The ABx2.lm is included to give the same error term to test against
as in the ANOVAs above.)
The baseline of all this is that I think it would be nice if a
function like Anova in the car package returned R(x | A, B, A:B)
instead of R(x | A, B, A:B, x^2) as SS for x in a model such as the
above.
(I hope I've made myself clearer, and not insulted anyone by
oversimplifying. :-)
--
Bjørn-Helge Mevik
More information about the R-help
mailing list