[R-sig-eco] Differences between (A)anova 's

Sun Sep 21 19:18:40 CEST 2008

Hi Rafael,

some comments below...

On Sat, Sep 20, 2008 at 5:35 PM, Rafael Maia
<queirozrafaelmv at yahoo.com.br> wrote:
> Hi all,
>
> I am trying to fit a linear model (using lm) with 3 predictors (2 continuous
> and 1 factor) for a response variable. I have no interaction terms nor
> missing values (though I do not have the same sample size in all factor
> cells), so I thought the results would be pretty straightforward, with no
> differences due to the "type" of Sum-of-Sqares employed. However, I am

The issue is whether your predictors are orthogonal.  If they are,
they are not "competing" for the same information and the order that
they enter the model will not affect the testing.

> getting somewhat conflicting results (especially for the variable "IMC", see
> below) when I use anova, Anova (from car package), and drop1. Actually,
> drop1(), Anova(type="II") and Anova(type="III") all give the same results,
> only anova() yields a different one.

See ?anova.lm, help(Anova, package=car), and ?drop1.lm.

anova.lm is testing the terms sequentially.

Anova tests the terms given that all other terms are in the model
(Type = 'III'), or given that all terms that at not at a higher level
in the hierarchy of terms (e..g. x1^2 and x1:x2 are at a higher level
than x1) are in the model (Type = 'II').

drop1.lm tests droping terms 1 at a time, while respecting the
'hierarchy'  (this is often referred to as 'marginality constraints').

>
> Due to "consistency" of results I am imagining i should ignore R standard
> anova's results, but I'd like to understand why and the implications this
> would have to any other test I conduct.
>

I think it is generally better not to automate the process of choosing
hypotheses of interest, but rather to form tests/comparisons
explicitly.  One good method to do this is to compare nested models,
where the difference in the two models reflects the hypothesis of
interest.

e.g., to test the ResPlum term given that all other terms are in the model

m2 <- update(m1, . ~ . -ResP)
anova(m1, m2)

Note there are many methods available for model testing/comparison
(e.g., F or LR tests, or Cp, AIC, BIC, ...)

Similarly, you can get away from the default method of testing factor
levels in summary.lm by explicitly choosing your contrasts to test
hypotheses of interest.

The entire process can be generalized via the General Linear
Hypothesis Test -- see e.g., help(glh.test, package = gmodels)

HTH,

Kingsford Jones