[R] Difference between summary.lm() and summary.aov()

Wed Dec 17 02:32:33 CET 2003

At 01:52 PM 12/16/2003 -0800, Alexander Sirotkin \[at Yahoo\] wrote:
>Thanks a lot to everybody. Two more questions, if you
>don't mind :
>
>How anova() treats non-categorical variables, such as
>severity in my case ? I was under impression that
>ANOVA is defined for categorical variables only.

The term ANOVA is commonly used in two related but distinct senses: a 
linear model in which the predictors are factors (categorical variables), 
and a table with sums of squares, associated df, and F-tests for various 
terms in a linear model, which need not consist only of factors. The 
anova() function computes the latter, and generalizes this, e.g., to 
analysis of deviance table for generalized linear models.

>I read about drop1() and I understand that it performs
>F-test for nested models, correct me if I'm wrong. It
>is unclear to me, however, how it manages to do this
>F-test for interactions ?

Actually, tests for the highest-order terms in the model are more 
straightforward than those for lower-order terms. The drop1() function does 
just that -- that is, drops a high-order term from the model and (for a 
linear model) computes the change in the residual sum of squares.

I hope that this helps,
  John

>Thanks a lot.
>
>--- Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> > "Alexander Sirotkin [at Yahoo]"
> > <alex_s_42 at yahoo.com> writes:
> >
> > > John,
> > >
> > > What you are saying is that any conclusion I can
> > make
> > > from summary.aov (for instance, to answer a
> > question
> > > if physician is a significant variable) will not
> > be
> > > correct ?
> >
> > Summary.aov is for summarizing aov objects, so
> > you're lucky to get
> > something that is sensible at all. You should use
> > anova() to get
> > analysis of variance tables. These are sequential so
> > that you can use
> > them (give or take some quibbles about the residual
> > variance) for
> > reducing the model from the "bottom up". I.e. if you
> > place "physician"
> > last, you get the F test for whether that variable
> > is significant.
> > However, a more convenient way of getting that
> > result is to use
> > drop1(). Even then there's no simple relation to the
> > two
> > t-tests, except that the F test tests the hypothesis
> > that *both*
> > coefficients are zero, where the t-tests do so
> > individually.
> >
> >
> > > --- John Fox <jfox at mcmaster.ca> wrote:
> > > > Dear Spencer and Alexander,
> > > >
> > > > In this case, physician is apparently a factor
> > with
> > > > three levels, so
> > > > summary.aov() gives you a sequential ANOVA,
> > > > equivalent to what you'd get
> > > > from anova(). There no simple relationship
> > between
> > > > the F-statistic for
> > > > physician, which has 2 df in the numerator, and
> > the
> > > > two t's. (By the way, I
> > > > doubt whether a sequential ANOVA is what's
> > wanted
> > > > here.)
> > > >
> > > > Regards,
> > > >   John
> > > >
> > > > At 09:17 AM 12/6/2003 -0800, Spencer Graves
> > wrote:
> > > > >      The square of a Student's t with "df"
> > degrees
> > > > of freedom is an F
> > > > > distribution with 1 and "df" degrees of
> > freedom.
> > > > >      hope this helps.  spencer graves
> > > > >
> > > > >Alexander Sirotkin [at Yahoo] wrote:
> > > > >
> > > > >>I have a simple linear model (fitted with
> > lm())
> > > > with 2
> > > > >>independant
> > > > >>variables : one categorical and one integer.
> > > > >>
> > > > >>When I run summary.lm() on this model, I get a
> > > > >>standard linear
> > > > >>regression summary (in which one categorical
> > > > variable
> > > > >>has to be
> > > > >>converted into many indicator variables) which
> > > > looks
> > > > >>like :
> > > > >>
> > > > >>            Estimate Std. Error t value
> > Pr(>|t|)
> > > > >>(Intercept)  -3595.3     2767.1  -1.299
> > 0.2005
> > > > >>physicianB     802.0     2289.5   0.350
> > 0.7277
> > > > >>physicianC    4906.8     2419.8   2.028
> > 0.0485 *
> > > > >>severity      7554.4      906.3   8.336
> > 1.12e-10
> > > > ***
> > > > >>
> > > > >>and when I run summary.aov() I get similar
> > ANOVA
> > > > table
> > > > >>:
> > > > >>           Df     Sum Sq    Mean Sq F value
> > > > Pr(>F)
> > > > >>physician    2  294559803  147279901  3.3557
> > > > 0.04381
> > > > >>*
> > > > >>severity     1 3049694210 3049694210 69.4864
> > > > 1.124e-10
> > > > >>***
> > > > >>Residuals   45 1975007569   43889057
> > > > >>
> > > > >>What is absolutely unclear to me is how
> > F-value
> > > > and
> > > > >>Pr(>F) for the
> > > > >>categorical "physician" variable of the
> > > > summary.aov()
> > > > >>is calculated
> > > > >>from the t-value of the summary.lm() table.
> > > > >>
> > > > >>I looked at the summary.aov() source code but
> > > > still
> > > > >>could not figure
> > > > >>it.
> > > > >>
> > > > >>Thanks a lot.
> > > > >>
> > > > >>__________________________________
> > > > >>
> > >

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox