# [R] Odd anova(lm()) order phenomenon, looking for an explanation

Berwin A Turlach berwin at maths.uwa.edu.au
Fri Mar 31 11:31:00 CEST 2006

```G'day Andrew,

>>>>> "AR" == Andrew Robinson <A.Robinson at ms.unimelb.edu.au> writes:

AR> I would have expected any term to explain less Sum Sq if
AR> listed second than if listed first.  Is my intuition awry?
Yes.  :-)

I would not expect that *any* term explains less Sum Sq if listed
second, then life and (linear) modelling would be simple.  The problem
with multiple regression is that a covariate might look unimportant if
used first (i.e. has small Sum Sq associated with it in the anova
table), but if we first correct for other regressor, then this
covariate becomes important all of a sudden (i.e. has large Sum Sq
associated with it in the anova table).

What surprised me, was that you observed this phenomenon with respect
to both regressors.  If only one had displayed this behaviour, I would
have readily explained it as above, but that both display it, I found
surprising too.

AR> Does anyone have any modelling insight to help me interpret
AR> what I'm seeing?
Don't know if the following example, which shows the same behaviour,

> n <- 100
> x1 <- runif(n, -1,1)
> x2 <- runif(n, -1,1)
> y <- x1*x1*x2 + rnorm(n, sd=0.05)
> y <- y - mean(y)
> anova(lm(y~x1+x2))
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq  F value Pr(>F)
x1         1 0.0055  0.0055   0.1485 0.7008
x2         1 5.0071  5.0071 134.3499 <2e-16 ***
Residuals 97 3.6151  0.0373
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(lm(y~x2+x1))
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq  F value Pr(>F)
x2         1 4.9930  4.9930 133.9723 <2e-16 ***
x1         1 0.0196  0.0196   0.5261   0.47
Residuals 97 3.6151  0.0373
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Cheers,

Berwin