[R] Marginal (type II) SS for powers of continuous variables in a linear model?

Tue Aug 12 13:14:07 CEST 2003

Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

> drop1 is the part of R that does type II sum of squares, and it works in 
> your example.  So does Anova in the current car:

I'm sorry, I should have included an example to clarify what I meant
(or point out my misunderstandings :-).  I'll do that below, but first
a comment:

> And in summary.aov() those *are* marginal SS, as balance is assumed
> for aov models. (That is not to say the software does not work otherwise, 
> but the interpretability depends on balance.)

Maybe I've misunderstood, but in the documentation for aov, it says
(under Details):
     This provides a wrapper to `lm' for fitting linear models to
     balanced or unbalanced experimental designs.

Also, is this example (lm(y~x+I(x^2), Df)) really balanced?  I think
of balance as the property that there is an equal number of
observations for every combination of the factors.  With x and x^2,
this doesn't happen.  For instance, x=1 and x^2=1 occurs once, but x=1
and x^2=4 never occurs (naturally).  Or have I misunderstood something?

Now, the example:

> Df2 <- expand.grid (A=factor(1:2), B=factor(1:2), x=1:5)
> Df2$y <- codes(Df2$A) + 2*codes(Df2$B) + 0.05*codes(Df2$A)*codes(Df2$B) +
+   Df2$x + 0.1*Df2$x^2 + 0.1*(0:4)
> Df2 <- Df2[-1,]    # Remove one observation to make it unbalanced

> ABx2.lm <- lm(y~A*B + x + I(x^2), data=Df2)

The SSs I call marginal are R(A | B, x, x^2), R(B | A, x, x^2),
R(A:B | A, B, x, x^2), R(x | A, B, A:B) and R(x^2 | A, B, A:B, x).

(Here, for instance, R(x | A, B, A:B) means the reduction of SSE due
to including x in a model when A, B and A:B (and the mean) are already
in the model. I've omitted the mean from the notation.)

> anova(ABx2.lm)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq   F value    Pr(>F)    
A          1  1.737   1.737   66.5700 1.801e-06 ***
B          1 13.647  13.647  523.0292 6.953e-12 ***
x          1 93.677  93.677 3590.1703 < 2.2e-16 ***
I(x^2)     1  0.583   0.583   22.3302 0.0003966 ***
A:B        1  0.011   0.011    0.4238 0.5263772    
Residuals 13  0.339   0.026                        
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

This gives SSs on the form R(A), R(B | A), R(x | A, B) etc.  (If the
design had been balanced (in A, B and x), this would have been the
same as the marginal SSs above.)

> drop1(ABx2.lm)
Single term deletions

Model:
y ~ A * B + x + I(x^2)
       Df Sum of Sq     RSS     AIC
<none>                0.339 -64.486
x       1     1.188   1.527 -37.901
I(x^2)  1     0.592   0.931 -47.294
A:B     1     0.011   0.350 -65.877

This gives the SSs R(x | A, B, A:B, x^2), R(x^2 | A, B, A:B, x) and
R(A:B | A, B, x, x^2).  The SS for x is not marginal as defined
above.

> library (car)
> Anova(ABx2.lm)
Anova Table (Type II tests)

Response: y
           Sum Sq Df  F value    Pr(>F)    
A          5.1806  1 198.5470 2.979e-09 ***
B         19.6610  1 753.5074 6.778e-13 ***
x          1.1879  1  45.5245 1.368e-05 ***
I(x^2)     0.5922  1  22.6970 0.0003699 ***
A:B        0.0111  1   0.4238 0.5263772    
Residuals  0.3392 13                       
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

This gives marginal SSs for A, B, x^2 and A:B, but as with drop1, the
SS for x is R(x | A, B, A:B, x^2).

The only way I've figured out to give the `correct' SS for x, i.e.,
R(x | A, B, A:B), is: 

> AB.lm <- lm(y~A*B, data=Df2)
> ABx.lm <- lm(y~A*B + x, data=Df2)
> anova (AB.lm, ABx.lm, ABx2.lm)
Analysis of Variance Table

Model 1: y ~ A * B
Model 2: y ~ A * B + x
Model 3: y ~ A * B + x + I(x^2)
  Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
1     15 93.760                                    
2     14  0.931  1    92.829 3557.651 < 2.2e-16 ***
3     13  0.339  1     0.592   22.697 0.0003699 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

(The ABx2.lm is included to give the same error term to test against
as in the ANOVAs above.)

The baseline of all this is that I think it would be nice if a
function like Anova in the car package returned R(x | A, B, A:B)
instead of R(x | A, B, A:B, x^2) as SS for x in a model such as the
above.

(I hope I've made myself clearer, and not insulted anyone by
oversimplifying. :-)

-- 
Bjørn-Helge Mevik