[R] ANOVA

Thu Jun 29 14:59:24 CEST 2000

> Date: Thu, 29 Jun 2000 14:22:24 +0000
> From: Lilla Di Scala <lilla at dimat.unipv.it>

> I have a problem regarding the anova() output. When I apply it to a
> single regression model, I do not understand how the values
> corresponding to the F statistics are obtained by the software. I
> believe that they are computed using differences between residual sums
> of squares of sequential models obtained from the imputed one (removing
> one independent variable at a time). I have tried to do the computations
> by hand, but my figures do not match R's.

That's essentially correct, but only if you mean removing one variable
at a time and not replacing it.  Here is an example:

> data(stackloss)
> fm <- lm(stack.loss ~ ., data=stackloss)
> anova(fm)
Analysis of Variance Table

Response: stack.loss
           Df  Sum Sq Mean Sq  F value    Pr(>F)
Air.Flow    1 1750.12 1750.12 166.3707 3.309e-10
Water.Temp  1  130.32  130.32  12.3886  0.002629
Acid.Conc.  1    9.97    9.97   0.9473  0.344046
Residuals  17  178.83   10.52                   

> fm1 <- update(fm, . ~ . - Acid.Conc.)
> anova(fm1, fm)
Analysis of Variance Table

Model 1: stack.loss ~ Air.Flow + Water.Temp
Model 2: stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.
  Res.Df Res.Sum Sq Df  Sum Sq F value Pr(>F)
1     18    188.795                          
2     17    178.830  1   9.965  0.9473 0.3440

> fm2 <- update(fm1, . ~ . - Water.Temp)
> fm3 <- update(fm2, . ~ . - Air.Flow)
> anova(fm2, fm1)
Analysis of Variance Table

Model 1: stack.loss ~ Air.Flow
Model 2: stack.loss ~ Air.Flow + Water.Temp
  Res.Df Res.Sum Sq Df Sum Sq F value   Pr(>F)
1     19     319.12                           
2     18     188.80  1 130.32  12.425 0.002419
> anova(fm3, fm2)
Analysis of Variance Table

Model 1: stack.loss ~ 1
Model 2: stack.loss ~ Air.Flow
  Res.Df Res.Sum Sq Df  Sum Sq F value    Pr(>F)
1     20    2069.24                             
2     19     319.12  1 1750.12  104.20 3.774e-09

Note that the SSqs are all the same, but the sequential table compares
them to the residual MSq, not to the next-larger model (as there is
no test that terms which have already been dropped are not significant).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._