[R] Result depends on order of factors in unbalanced designs (lme, anova)?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Jun 22 17:26:57 CEST 2007


'anova' is rather a misnomer here.  In terms of the description in 
?anova.lme, you have

      When only one fitted model object is present, a data frame with
      the sums of squares, numerator degrees of freedom, denominator
      degrees of freedom, F-values, and P-values for Wald tests for the
      terms in the model ...

but there are no 'sums of squares' shown.  However, the crucial part of 
that help page is

     type: an optional character string specifying the type of sum of
           squares to be used in F-tests for the terms in the model. If
           '"sequential"', the sequential sum of squares obtained by
           including the terms in the order they appear in the model is
           used; else, if '"marginal"', the marginal sum of squares
           obtained by deleting a term from the model at a time is used.
           This argument is only used when a single fitted object is
           passed to the function. Partial matching of arguments is
           used, so only the first character needs to be provided.
           Defaults to '"sequential"'.

so these are sequential fits (just like anova for an lm fit), and yes, 
sequential fits do in general depend on the sequence of terms.

The issues of interpretation are exactly those of unbalanced linear 
models, and you will find advice on that in many places, e.g. in MASS.


On Thu, 21 Jun 2007, Karl Knoblick wrote:

> Dear R-Community!
>
> For example I have a study with 4 treatment groups (10 subjects per group) and 4 visits. Additionally, the gender is taken into account. I think - and hope this is a goog idea (!) - this data can be analysed using lme as below.
>
> In a balanced design everything is fine, but in an unbalanced design there are differences depending on fitting y~visit*treat*gender or y~gender*visit*treat - at least with anova (see example). Does this make sense? Which ordering might be the correct one?
>
> Here the example script:
> library(nlme)
> set.seed(123)
> # Random generation of data:
> NSubj<-40 # No. of subjects
> set.seed(1234)
> id<-factor(rep(c(1:NSubj),4)) # ID of subjects
> treat<-factor(rep(rep(1:4,each=5),4)) # Treatment 4 Levels
> gender<-factor(rep(rep(1:2, each=20),4))
> visit<-factor(rep(1:4, each=NSubj))
> y<-runif(4*NSubj) # Results
> # Add effects
> y<-y+0.01*as.integer(visit)
> y<-y+0.02*as.integer(gender)
> y<-y+0.024*as.integer(treat)
> df<-data.frame(id, treat, gender, visit, y)
> # groupedData object for lme
> gdat<-groupedData(y ~ visit|id, data=df)
> # fits - different ordering of factors
> fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> anova(fit1)
> fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> anova(fit2)
> # Result: identical (balanced design so far), ok
> # Now change gender of subject 1
> gdat$gender[c(1,41,81,121)]<-2
> # onece more fits with different ordering of factors
> fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> anova(fit1)
> fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> anova(fit2)
> # Result: There are differences!!
>
> Hope anybody can help or give me advice how to interpret these results correctly or how to avoid this problem! Is there a better possibility to analyse these data than lme?
>
> Thanks!
> Karl

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list