[R] [package-car:Anova] extracting residuals from Anova for Type II/III Repeated Measures ?

Fri Feb 20 23:41:16 CET 2009

Dear Tal,

I didn't have time to look at all this yesterday.

Since aov() doesn't do what I typically want to do, I guess I've not paid
much attention to it recently. I can see, however, that you appear to have
specified the error strata incorrectly, since (given your desire to compare
to Anova) the within-block factors are nested within blocks. Something like

> npk.aovE <- aov(value ~  N*P*K + Error(block/N*P*K), npk.long)

should be closer to what you want, and in fact produces all of the sums of
squares, but doesn't put all of the error terms together with the
corresponding terms; thus, you get, e.g., the test for N but not for P and
K, even though the SSs and error SSs for the latter are in the table. By
permuting N, P, and K, you can get the other F tests. I suspect that this
has to do with the sequential approach taken by aov() but someone else more
familiar with how it works will have to fill in the details. I wonder,
though, whether you've read the sections in Statistical Models in S and MASS
referenced in the help file for aov.

> summary(npk.aovE) 

Error: block
          Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  5 153.147  30.629               

Error: P
  Df Sum Sq Mean Sq
P  1 16.803  16.803

Error: K
  Df Sum Sq Mean Sq
K  1 190.40  190.40

Error: block:N
          Df Sum Sq Mean Sq F value   Pr(>F)   
N          1 378.56  378.56  38.614 0.001577 **
Residuals  5  49.02    9.80                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Error: block:P
          Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  5 19.1317  3.8263               

Error: block:K
          Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  5 24.4933  4.8987               

Error: P:K
    Df  Sum Sq Mean Sq
P:K  1 0.96333 0.96333

Error: block:N:P
          Df Sum Sq Mean Sq F value  Pr(>F)  
N:P        1 42.563  42.563  8.6888 0.03197 *
Residuals  5 24.493   4.899                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Error: block:N:K
          Df Sum Sq Mean Sq F value  Pr(>F)   
N:K        1 66.270  66.270  17.320 0.00881 **
Residuals  5 19.132   3.826                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Error: block:P:K
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  5 49.018   9.804               

Error: block:N:P:K
          Df  Sum Sq Mean Sq F value Pr(>F)
N:P:K      1  74.003  74.003  2.4161 0.1808
Residuals  5 153.147  30.629               

John

> -----Original Message-----
> From: Tal Galili [mailto:tal.galili at gmail.com]
> Sent: February-19-09 2:23 AM
> To: John Fox
> Cc: r-help at r-project.org
> Subject: Re: [R] [package-car:Anova] extracting residuals from Anova for
Type
> II/III Repeated Measures ?
> 
> Hello John, thank you for the fast reply.
> 
> Thanks to your answer I was able to reproduce the "between" residuals by
> simply writing:
> sum(residuals(mod.ok)^2)
> 
> But I must admit that how to obtain the "within" was beyond me, so some
> advice here would be of great help.
> (Also, it might be worth adding these residuals to the standard Anova
output,
> since I believe some researchers using the package could find it useful
when
> reporting there results)
> 
> 
> By working with the Anova package, I stumbled into a conflict between the
> Anova and the aov outputs.
> The toy data used was reassembled from the (?aov) example (and filled in
for
> balance, and contr.sum contrasts where assigned).
> When I compared the Anova (on SS type III) and the aov F (And p.value)
> results - I found significant differences between them (though the SS
where
> identical).
> If I understood correctly, for balanced designs both test should have
yielded
> the same results. any ideas on what the source of the difference might be?
> 
> Here is the code:
> 
> 
> 
> 
> ## get the data:
> utils::data(npk, package="MASS")
> library(reshape) # for data manipulation
> colnames(npk)[5] <- "value" # so that the data set will read proparly in
> "reshape"
> 
> npk.wide = cast(block ~  N +P+ K, data = npk)
> 
> # now we fill in the NA's, to get a balanced design
> is.na(npk.wide) #it has na's. we'll fill them with that factor's
combination
> mean, so to keep some significance around
> combination.mean <- apply(npk.wide, 2, function(x){mean(x,na.rm = T)})
> for(i in c(2:9))
> { npk.wide[is.na(npk.wide)[,i],i] = combination.mean[i-1] }
> is.na(npk.wide) #no na's anymore
> 
> # recreating a balanced npk for the aov
> npk.long <- melt(npk.wide)
> 
> head(npk.long,4)
> #         block    value N P K
> #X0_0_0       1 46.80000 0 0 0
> #X0_0_0.1     2 51.43333 0 0 0
> #X0_0_0.2     3 51.43333 0 0 0
> #X0_0_0.3     4 51.43333 0 0 0
> head(npk.wide,4)
> #  block    0_0_0 0_0_1    0_1_0 0_1_1    1_0_0    1_0_1    1_1_0    1_1_1
> #1     1 46.80000  52.0 54.33333  49.5 63.76667 57.00000 62.80000 54.36667
> #2     2 51.43333  55.5 56.00000  50.5 59.80000 54.66667 57.93333 58.50000
> #3     3 51.43333  55.0 62.80000  50.5 69.50000 54.66667 57.93333 55.80000
> #4     4 51.43333  45.5 44.20000  50.5 62.00000 54.66667 57.93333 48.80000
> 
> 
> # npk.wide$block # it is a factor - good.
> 
> 
> # change into contr.sum
> for(i in c(1,3:5)) npk.long[,i] <- C(npk.long[,i] , "contr.sum")
> npk.wide[,1] <- C(npk.wide[,1] , "contr.sum")
> 
> 
> 
> ## as a test, not particularly sensible statistically
> npk.aovE <- aov(value~  N*P*K + Error(block), npk.long)
> npk.aovE
> summary(npk.aovE)
> 
> ## output
> Error: block
>           Df  Sum Sq Mean Sq F value Pr(>F)
> Residuals  5 153.147  30.629
> 
> Error: Within
>           Df Sum Sq Mean Sq F value    Pr(>F)
> N          1 378.56  378.56 39.1502 3.546e-07 ***
> P          1  16.80   16.80  1.7378   0.19599
> K          1 190.40  190.40 19.6911 8.657e-05 ***
> N:P        1  42.56   42.56  4.4018   0.04319 *
> N:K        1  66.27   66.27  6.8535   0.01298 *
> P:K        1   0.96    0.96  0.0996   0.75415
> N:P:K      1  74.00   74.00  7.6533   0.00899 **
> Residuals 35 338.43    9.67
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> ## END output
> 
> # the residuals SS
> # 338.43 + 153.147
> # ==
> sum(residuals(lm(value~  N*P*K , npk.long))^2)
> #  491.58
> 
> 
> 
> idata  <- cast(N+P+K  ~ . , data = npk.long)[,c(1:3)]
> idata
> for(i in 1:3) idata[,i] <- C(idata[,i] , "contr.sum")
> 
> mod.ok <- lm(as.matrix(npk.wide[,-1]) ~  1, data = npk.wide)
> av.ok <- Anova(mod.ok, idata=idata, idesign=~ N*P*K, type = "III")
> summary(av.ok, multivariate=FALSE)
> 
> 
> ## output - with different F and p.value results
> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> 
>                 SS num Df Error SS den Df         F    Pr(>F)
> (Intercept) 144541      1      153      5 4719.0302 1.238e-08 ***
> N              379      1       49      5   38.6145  0.001577 **
> P               17      1       19      5    4.3915  0.090257 .
> K              190      1       24      5   38.8684  0.001554 **
> N:P             43      1       24      5    8.6888  0.031971 *
> N:K             66      1       19      5   17.3195  0.008810 **
> P:K              1      1       49      5    0.0983  0.766580
> N:P:K           74      1      153      5    2.4161  0.180810
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> ## End output
> 
> 
> # the SS residuals are the same thou
> sum(residuals(mod.ok)^2)
> #  491.58
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Feb 19, 2009 at 12:24 AM, John Fox <jfox at mcmaster.ca> wrote:
> 
> 
> 	Dear Tal,
> 
> 	I suppose that the "between" residuals would be obtained, for your
> example,
> 	by residuals(mod.ok). I'm not sure what the "within" residuals are.
You
> 	could apply the transformation for each within-subject effect to the
> matrix
> 	of residuals to get residuals for that effect -- is that what you
had
> in
> 	mind? A list of transformations is in the element $P of the
Anova.mlm
> 	object.
> 
> 	Regards,
> 	 John
> 
> 	------------------------------
> 	John Fox, Professor
> 	Department of Sociology
> 	McMaster University
> 	Hamilton, Ontario, Canada
> 	web: socserv.mcmaster.ca/jfox
> 
> 
> 
> 	> -----Original Message-----
> 	> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org]
> 	On
> 	> Behalf Of Tal Galili
> 	> Sent: February-18-09 4:04 PM
> 	> To: r-help at r-project.org
> 	> Subject: [R] [package-car:Anova] extracting residuals from Anova
for
> Type
> 	> II/III Repeated Measures ?
> 	>
> 	> Hello dear R members.
> 	> I have been learning the Anova syntax in order to perform an SS
type
> III
> 	> Anova with repeated measures designs (thank you Prof. John Fox!)
> 	> And another question came up: where/what are the (between/within)
> 	residuals
> 	> for my model?
> 	>
> 	>
> 	>
> 	> ############  Play code:
> 	>
> 	>
> 	> phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5,
> 5)),
> 	>     levels=c("pretest", "posttest", "followup"))
> 	> hour <- ordered(rep(1:5, 3))
> 	> idata <- data.frame(phase, hour)
> 	> idata
> 	>
> 	> mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
> 	>                      post.1, post.2, post.3, post.4, post.5,
> 	>                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
> 	>  treatment*gender,
> 	>                 data=OBrienKaiser)
> 	> av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour)
> 	>
> 	>
> 	> summary(av.ok, multivariate=FALSE)
> 	>
> 	> ## Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> 	> ##
> 	> ##                                  SS num Df Error SS den Df
F
> 	>  Pr(>F)
> 	> ## treatment                   211.286      2  228.056     10
4.6323
> 	>  0.037687
> 	> ## gender                       58.286      1  228.056     10
2.5558
> 	>  0.140974
> 	> ## treatment:gender            130.241      2  228.056     10
2.8555
> 	>  0.104469
> 	> ## phase                       167.500      2   80.278     20
20.8651
> 	> 1.274e-05
> 	> ## treatment:phase              78.668      4   80.278     20
4.8997
> 	>  0.006426
> 	> ## gender:phase                  1.668      2   80.278     20
0.2078
> 	>  0.814130
> 	> ## treatment:gender:phase       10.221      4   80.278     20
0.6366
> 	>  0.642369
> 	> ## hour                        106.292      4   62.500     40
17.0067
> 	> 3.191e-08
> 	> ## treatment:hour                1.161      8   62.500     40
0.0929
> 	>  0.999257
> 	> ## gender:hour                   2.559      4   62.500     40
0.4094
> 	>  0.800772
> 	> ## treatment:gender:hour         7.755      8   62.500     40
0.6204
> 	>  0.755484
> 	> ## phase:hour                   11.083      8   96.167     80
1.1525
> 	>  0.338317
> 	> ## treatment:phase:hour          6.262     16   96.167     80
0.3256
> 	>  0.992814
> 	> ## gender:phase:hour             6.636      8   96.167     80
0.6900
> 	>  0.699124
> 	> ## treatment:gender:phase:hour  14.155     16   96.167     80
0.7359
> 	>  0.749562
> 	>
> 	>
> 	>
> 	>
> 	>
> 	>
> 	>
> 	>
> 	>
> 	> --
> 	> ----------------------------------------------
> 	>
> 	>
> 	> My contact information:
> 	> Tal Galili
> 	> Phone number: 972-50-3373767
> 	> FaceBook: Tal Galili
> 	> My Blogs:
> 	> www.talgalili.com
> 	> www.biostatistics.co.il
> 	>
> 
> 	>       [[alternative HTML version deleted]]
> 	>
> 	> ______________________________________________
> 	> R-help at r-project.org mailing list
> 	> https://stat.ethz.ch/mailman/listinfo/r-help
> 	> PLEASE do read the posting guide
> 	http://www.R-project.org/posting-guide.html
> 	> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
> 
> 
> 
> --
> ----------------------------------------------
> 
> 
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> www.talgalili.com
> www.biostatistics.co.il
> 
>