[R] slow computation of mixed ANOVA using aov

Sat Mar 19 19:20:49 CET 2005

Peter Dalgaard <p.dalgaard at biostat.ku.dk> writes:

> - (not too sure of this, but R 2.1.0 will have some new code for
>   multivariate lm, with intra-individual designs, and
>   tests under the sphericity assumptions; it is possible that
>   your models can be reformulated in that framework. You'd have to
>   set it up as a model with 160 responses on 56 subjects, though, and
>   the code wasn't really designed with that sort of intrasubject
>   dimensionality in mind.)

OK, I've tried this now and it does seem to work, and much faster than
the aov() approach. I had to fix a buglet in the code (eigenvalues
coming up with small imaginary parts due to numerics), so currently
available versions won't quite work, but it should be in the alpha
releases that are supposed to start Monday.

Here's how it works:

### orig setup, edited so as to actually work...
 f1<-gl(40,1,8960,ordered=T)
 f2<-gl(7,1280,8960,ordered=T)
 f3<-gl(4,40,8960,ordered=T)
 subject<-gl(56,160,8960,ordered=T)
 dv<-rnorm(8960,mean=500,sd=50)
 d <- data.frame(f1,f2,f3,subject,dv)

### Regroup to 56x160 matrix response 
 f2a <- f2[seq(1,8801,160)]
 idata <- d[1:160,] # intrasubj. design, actually need only f1,f3 from this
 Y <- matrix(dv,56,byrow=T)

### now fit models with effect of f2a, constant, and empty
 fit <- lm(Y~f2a)
 fit2 <- lm(Y~1)
 fit3 <- lm(Y~0)

## The main trick is to do anova on within-subject effects. First look
## at the interactions, alias residuals from an additive model ~f1+f3.
## The reduction Model 1-> Model 2 corresponds to testing the f1:f2:f3
## interaction in aov (tests whether the f1:f3 interaction depends on
## f2a) and Model 2 -> Model 3 is the test for the f1:f3 interaction
## being zero.

## I'm not quite sure what to make of the correction terms when the
## estimated SSD matrix is singular (it has to be, the dimension is
## 117, but the df is only 49). Probably the G-G epsilon is bogus, but
## the H-F one seems to fare rather well.

> anova(fit,fit2,fit3,test="Spherical",X=~f1+f3,idata=idata)
Analysis of Variance Table

Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0

Contrasts orthogonal to
~f1 + f3

Greenhouse-Geisser epsilon: 0.2903
Huynh-Feldt epsilon:        0.9639

  Res.Df   Df Gen.var.      F num Df den Df  Pr(>F)  G-G Pr  H-F Pr
1     49       0.00000
2     55    6  0.00000 0.9036    702   5733 0.96034 0.82268 0.95748
3     56    1  0.00000 1.0460    117   5733 0.35003 0.39624 0.35206

## Next, we can look at the f1 effects. This is basically the
## difference between two projections, on ~f1+f3 and ~f3
## This gives us the tests for f2:f1 and f1

> anova(fit,fit2,fit3,test="Spherical",M=~f1+f3,X=~f3,idata=idata)
Analysis of Variance Table

Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0

Contrasts orthogonal to
~f3

Contrasts spanned by
~f1 + f3

Greenhouse-Geisser epsilon: 0.5598
Huynh-Feldt epsilon:        1.0284

  Res.Df   Df Gen.var.      F num Df den Df Pr(>F) G-G Pr H-F Pr
1     49        315.74
2     55    6   344.98 0.9883    234   1911   0.54   0.52   0.54
3     56    1   347.15 0.8393     39   1911   0.75   0.68   0.75

## Same thing with f3

> anova(fit,fit2,fit3,test="Spherical",M=~f1+f3,X=~f1,idata=idata)
Analysis of Variance Table

Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0

Contrasts orthogonal to
~f1

Contrasts spanned by
~f1 + f3

Greenhouse-Geisser epsilon: 0.9482
Huynh-Feldt epsilon:        1.0128

  Res.Df  Df Gen.var.      F num Df den Df Pr(>F) G-G Pr H-F Pr
1     49       35.171
2     55   6   34.679 0.9010     18    147  0.578  0.574  0.578
3     56   1   34.658 1.0039      3    147  0.393  0.390  0.393

## Actually, because of the complete design, we can ignore f1 and get
## the same analysis:

> anova(fit,fit2,fit3,test="Spherical",M=~f3,X=~1,idata=idata)
Analysis of Variance Table

Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0

Contrasts orthogonal to
~1

Contrasts spanned by
~f3

Greenhouse-Geisser epsilon: 0.9482
Huynh-Feldt epsilon:        1.0128

  Res.Df  Df Gen.var.      F num Df den Df Pr(>F) G-G Pr H-F Pr
1     49       35.171
2     55   6   34.679 0.9010     18    147  0.578  0.574  0.578
3     56   1   34.658 1.0039      3    147  0.393  0.390  0.393

## Finally we get the main effect of f2a as follows. Notice that the
## Model 2 -> Model 3 reduction is the test for zero overall mean,
## which you might (and aov does) want to omit.  

> anova(fit,fit2,fit3,test="Spherical",M=~1,X=~0,idata=idata)
Analysis of Variance Table

Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0

Contrasts orthogonal to
~0

Contrasts spanned by
~1

Greenhouse-Geisser epsilon: 1
Huynh-Feldt epsilon:        1

  Res.Df Df Gen.var.          F num Df den Df     Pr(>F)     G-G Pr
  H-F Pr
1     49          11
2     55  6       12 1.5010e+00      6     49  1.976e-01  1.976e-01
  1.976e-01
3     56  1   249956 1.2699e+06      1     49 8.346e-110 8.346e-110
  8.346e-110

## Finally aov() results for comparison:

> system.time(aa <- aov(dv~f1*f2*f3+Error(subject/(f1+f3)),data=d))
[1] 526.50   9.27 561.29   0.00   0.00
> summary(aa)

Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
f2         6  15883    2647   1.501 0.1976
Residuals 49  86411    1763

Error: subject:f1
            Df  Sum Sq Mean Sq F value Pr(>F)
f1          39   81906    2100  0.8393 0.7487
f1:f2      234  578666    2473  0.9883 0.5376
Residuals 1911 4781665    2502

Error: subject:f3
           Df Sum Sq Mean Sq F value Pr(>F)
f3          3   6899    2300  1.0039 0.3930
f2:f3      18  37153    2064  0.9010 0.5782
Residuals 147 336732    2291

Error: Within
            Df   Sum Sq  Mean Sq F value Pr(>F)
f1:f3      117   308658     2638  1.0460 0.3500
f1:f2:f3   702  1599743     2279  0.9036 0.9603
Residuals 5733 14458856     2522
>

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907