[R] lme() vs aov(y ~ A*B + Error(aa %in% A + bb %in% B)) [repost]

Tue Jun 17 17:32:32 CEST 2003

>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Tue, 17 Jun 2003 10:13:44 +0200 writes:

    MM> I've posted the following to R-help on May 15.
    MM> It has reproducible R code for real data -- and a real
    MM> (academic, i.e unpaid) consultion background.

    MM> I'd be glad for some insight here, mainly not for myself.
    MM> In the mean time, we've learned that it is to be expected for
    MM> anova(*, "marginal") to be contrast dependent, but still are
    MM> glad for advice if you have experience.

    MM> Thank you in advance,
    MM> Martin

and indeed, the forwarded message, also a kind of attachment (message/rfc822)
was dropped by mailman's content filtering too...
Here, it's "as regular" text:

Here is a reproducible (when you're on-net) script for a
relatively simple real data (consulting) situation here.

The data analyst tried to use lme() rather than  aov( + Error()) 
and asked several questions that I couldn't easily answer and
thought to be interesting for a more general audience in any
case.

I attach the script as text/plain file that you should run
"everywhere".  The three questions are summarized at the
beginning of the file.

----------------------------snip-snip------------------------------------

### LME/aov example {real data!}
### Two crossed fixed effects each with a random factor inside

### Problems :
### 1) need(?) ugly pseudo grouping
### 2) anova() depends on setting of contrasts
### 3) numerical problem  "Singular precision matrix" .. (should not happen?)

download.file("ftp://stat.ethz.ch/U/maechler/R/ex4.rda", "lme-ex4.rda")
load("lme-ex4.rda")##-> object `dw'
summary(dw)
str(dw)

## For some EDA {and S+ compatibility; for R, I'd use  with(dw, ...)} :
attach(dw)
## TREE: 1:3 at "1", 4:6 at "2" : TREE completely nested in LOCATION
table(TREE, LOCATION)
## and the same with THALLUS in PROV: th 1:5 > pr1; 6:7 > 2; 8:12 > 3;...
table(THALLUS, PROV)
colSums(table(THALLUS, PROV) > 0)
## pr1 pr2 pr3 pr4 pr5 pr6
##   5   2   5   5   1   5

interaction.plot(PROV, LOCATION, y)
## Hmm.. PROV=5 is a bit peculiar
## well, for one because it has only 3+3 observations :
table(PROV, LOCATION)

plot(y ~ PROV, data = dw)# not so useful
## or (better?):
stripchart(y ~ PROV, vertical=TRUE, method="stack")

if(FALSE)# quite a bit to see
table(PROV, THALLUS, TREE)
all(table(PROV, THALLUS, TREE) %in% c(0,1)) ## TRUE

detach("dw")##--------------------- end of "EDA" -------------

## Setup:
## - 2 locations (LOCATION) with 3 trees each
## - 6 kinds proveniences (PROV) with 2-5 kinds "thallus" each
## - L * P are fixed (crossed), each has a random factor nested inside

## ==>  Simple Approach is possible
##
## aov Models with Error()
## -----------------------

##--------- 1 ------------------------ default contrasts ---------
options(contrasts=c("contr.treatment", "contr.poly"))
aov1 <- aov(y ~ PROV*LOCATION + Error(TREE%in%LOCATION + THALLUS%in%PROV),
            data = dw)
summary(aov1)
##--------- 2 ------------------------ sum contrasts ---------
options(contrasts=c("contr.sum", "contr.poly"))
aov2 <- aov(y ~ PROV*LOCATION + Error(TREE%in%LOCATION + THALLUS%in%PROV),
            data = dw)
## really the imporant things do NOT depend on the contrasts:
stopifnot(all.equal(summary(aov1),
                    summary(aov2)))

## For residual analysis :
aov1. <- aov(y ~ PROV*LOCATION + TREE%in%LOCATION + THALLUS%in%PROV, data = dw)
## non-sense model
## -- used only for residuals() , fitted()  which "fail" for aov1 << !!
##  ((this is not nice behavior !))
par(mfrow = c(1,2))
plot(fitted(aov1.), residuals(aov1.)); abline(h = 0, lty = 2)
qqnorm(residuals(aov1.)); qqline(residuals(aov1.))

## model.tables(aov1.,type = "means")

### Now try to be modern, use REML and hence
###
### lme models:
### === -------

library(nlme)

## Construct a pseudo grouping factor with just one level
dw$pseudogrp <- factor(c("g1"))
dwgrp <- groupedData(y ~ 1 | pseudogrp, data=dw)

###--------- 1 ------------------------ default contrasts ---------

options(contrasts=c("contr.treatment", "contr.poly"))

lme1  <- lme(y ~ LOCATION * PROV , data = dwgrp,
             random = pdBlocked(list(pdIdent(~TREE -1),
                                     pdIdent(~THALLUS -1))))
anova(lme1, type = "marginal")
## quite close to the aov() results above;
##  F value for PROV:LOCATION is the same, but has wrong (?) deg.freedom
##  ((because of the "wrong" pseudogrp grouping))

if(FALSE)
    summary(lme1)
intervals(lme1,which = "var-cov")
ranef(lme1)

trellis.device()
lset(col.whitebg())

## Residual analysis:
par(mfrow = c(1,2))
plot(fitted(lme1), residuals(lme1)); abline(h = 0, lty = 2)
qqnorm(residuals(lme1)); qqline(residuals(lme1))

###--------- 2 ------------------------ sum contrasts ---------

options(contrasts=c("contr.sum", "contr.poly"))

lme2  <- lme(y ~ LOCATION * PROV , data = dwgrp,
             random = pdBlocked(list(pdIdent(~TREE -1),
                                     pdIdent(~THALLUS -1))))

## Same model as lme1 ?

## In some sense, yes:
stopifnot(all.equal(residuals(lme1),
                    residuals(lme2)),
          all.equal(fitted(lme1),
                    fitted(lme2)))

## But this is quite different --  <<< why (REML) ? >>>>

## -- the SS(residuals) are the same,
## and the factor's SS should not depend on the contrasts?
anova(lme2, type = "marginal")

intervals(lme2,which = "var-cov")
ranef(lme2)

##

ty <- asin(sqrt(y))
lme4 <- lme(ty ~ LOCATION * PROV , data = dwgrp,
            random = pdBlocked(list(pdIdent(~TREE -1),
             pdIdent(~THALLUS -1))))
## Warning messages:
## 1: Singular precision matrix in level -1, block 1
## 2: Singular precision matrix in level -1, block 1

## ?????  ^^^^ numerical problem ?? --- investigating :
if(FALSE){
    options(warn = 2)## make warnings into errors !
    lme4 <- lme(ty ~ LOCATION * PROV , data = dwgrp,
                random = pdBlocked(list(pdIdent(~TREE -1),
                pdIdent(~THALLUS -1))))
    traceback()
    ## gives a long stack trace, with the lowest level function
    ##  logLik.reStruct(.)

    options(warn = 0) # back to default
}

anova(lme4) # is quite a bit different from the anova()s above!

ry <- round(ty, 2)
lme5 <- lme(ry ~ LOCATION * PROV , data = dwgrp,
            random = pdBlocked(list(pdIdent(~TREE -1),
             pdIdent(~THALLUS -1))))