[R] [package-car:Anova] extracting residuals from Anova for Type II/III Repeated Measures ?
John Fox
jfox at mcmaster.ca
Fri Feb 20 23:41:16 CET 2009
Dear Tal,
I didn't have time to look at all this yesterday.
Since aov() doesn't do what I typically want to do, I guess I've not paid
much attention to it recently. I can see, however, that you appear to have
specified the error strata incorrectly, since (given your desire to compare
to Anova) the within-block factors are nested within blocks. Something like
> npk.aovE <- aov(value ~ N*P*K + Error(block/N*P*K), npk.long)
should be closer to what you want, and in fact produces all of the sums of
squares, but doesn't put all of the error terms together with the
corresponding terms; thus, you get, e.g., the test for N but not for P and
K, even though the SSs and error SSs for the latter are in the table. By
permuting N, P, and K, you can get the other F tests. I suspect that this
has to do with the sequential approach taken by aov() but someone else more
familiar with how it works will have to fill in the details. I wonder,
though, whether you've read the sections in Statistical Models in S and MASS
referenced in the help file for aov.
> summary(npk.aovE)
Error: block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 153.147 30.629
Error: P
Df Sum Sq Mean Sq
P 1 16.803 16.803
Error: K
Df Sum Sq Mean Sq
K 1 190.40 190.40
Error: block:N
Df Sum Sq Mean Sq F value Pr(>F)
N 1 378.56 378.56 38.614 0.001577 **
Residuals 5 49.02 9.80
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: block:P
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 19.1317 3.8263
Error: block:K
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 24.4933 4.8987
Error: P:K
Df Sum Sq Mean Sq
P:K 1 0.96333 0.96333
Error: block:N:P
Df Sum Sq Mean Sq F value Pr(>F)
N:P 1 42.563 42.563 8.6888 0.03197 *
Residuals 5 24.493 4.899
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: block:N:K
Df Sum Sq Mean Sq F value Pr(>F)
N:K 1 66.270 66.270 17.320 0.00881 **
Residuals 5 19.132 3.826
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: block:P:K
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 49.018 9.804
Error: block:N:P:K
Df Sum Sq Mean Sq F value Pr(>F)
N:P:K 1 74.003 74.003 2.4161 0.1808
Residuals 5 153.147 30.629
John
> -----Original Message-----
> From: Tal Galili [mailto:tal.galili at gmail.com]
> Sent: February-19-09 2:23 AM
> To: John Fox
> Cc: r-help at r-project.org
> Subject: Re: [R] [package-car:Anova] extracting residuals from Anova for
Type
> II/III Repeated Measures ?
>
> Hello John, thank you for the fast reply.
>
> Thanks to your answer I was able to reproduce the "between" residuals by
> simply writing:
> sum(residuals(mod.ok)^2)
>
> But I must admit that how to obtain the "within" was beyond me, so some
> advice here would be of great help.
> (Also, it might be worth adding these residuals to the standard Anova
output,
> since I believe some researchers using the package could find it useful
when
> reporting there results)
>
>
> By working with the Anova package, I stumbled into a conflict between the
> Anova and the aov outputs.
> The toy data used was reassembled from the (?aov) example (and filled in
for
> balance, and contr.sum contrasts where assigned).
> When I compared the Anova (on SS type III) and the aov F (And p.value)
> results - I found significant differences between them (though the SS
where
> identical).
> If I understood correctly, for balanced designs both test should have
yielded
> the same results. any ideas on what the source of the difference might be?
>
> Here is the code:
>
>
>
>
> ## get the data:
> utils::data(npk, package="MASS")
> library(reshape) # for data manipulation
> colnames(npk)[5] <- "value" # so that the data set will read proparly in
> "reshape"
>
> npk.wide = cast(block ~ N +P+ K, data = npk)
>
> # now we fill in the NA's, to get a balanced design
> is.na(npk.wide) #it has na's. we'll fill them with that factor's
combination
> mean, so to keep some significance around
> combination.mean <- apply(npk.wide, 2, function(x){mean(x,na.rm = T)})
> for(i in c(2:9))
> { npk.wide[is.na(npk.wide)[,i],i] = combination.mean[i-1] }
> is.na(npk.wide) #no na's anymore
>
> # recreating a balanced npk for the aov
> npk.long <- melt(npk.wide)
>
> head(npk.long,4)
> # block value N P K
> #X0_0_0 1 46.80000 0 0 0
> #X0_0_0.1 2 51.43333 0 0 0
> #X0_0_0.2 3 51.43333 0 0 0
> #X0_0_0.3 4 51.43333 0 0 0
> head(npk.wide,4)
> # block 0_0_0 0_0_1 0_1_0 0_1_1 1_0_0 1_0_1 1_1_0 1_1_1
> #1 1 46.80000 52.0 54.33333 49.5 63.76667 57.00000 62.80000 54.36667
> #2 2 51.43333 55.5 56.00000 50.5 59.80000 54.66667 57.93333 58.50000
> #3 3 51.43333 55.0 62.80000 50.5 69.50000 54.66667 57.93333 55.80000
> #4 4 51.43333 45.5 44.20000 50.5 62.00000 54.66667 57.93333 48.80000
>
>
> # npk.wide$block # it is a factor - good.
>
>
> # change into contr.sum
> for(i in c(1,3:5)) npk.long[,i] <- C(npk.long[,i] , "contr.sum")
> npk.wide[,1] <- C(npk.wide[,1] , "contr.sum")
>
>
>
> ## as a test, not particularly sensible statistically
> npk.aovE <- aov(value~ N*P*K + Error(block), npk.long)
> npk.aovE
> summary(npk.aovE)
>
> ## output
> Error: block
> Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 5 153.147 30.629
>
> Error: Within
> Df Sum Sq Mean Sq F value Pr(>F)
> N 1 378.56 378.56 39.1502 3.546e-07 ***
> P 1 16.80 16.80 1.7378 0.19599
> K 1 190.40 190.40 19.6911 8.657e-05 ***
> N:P 1 42.56 42.56 4.4018 0.04319 *
> N:K 1 66.27 66.27 6.8535 0.01298 *
> P:K 1 0.96 0.96 0.0996 0.75415
> N:P:K 1 74.00 74.00 7.6533 0.00899 **
> Residuals 35 338.43 9.67
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> ## END output
>
> # the residuals SS
> # 338.43 + 153.147
> # ==
> sum(residuals(lm(value~ N*P*K , npk.long))^2)
> # 491.58
>
>
>
> idata <- cast(N+P+K ~ . , data = npk.long)[,c(1:3)]
> idata
> for(i in 1:3) idata[,i] <- C(idata[,i] , "contr.sum")
>
> mod.ok <- lm(as.matrix(npk.wide[,-1]) ~ 1, data = npk.wide)
> av.ok <- Anova(mod.ok, idata=idata, idesign=~ N*P*K, type = "III")
> summary(av.ok, multivariate=FALSE)
>
>
> ## output - with different F and p.value results
> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
>
> SS num Df Error SS den Df F Pr(>F)
> (Intercept) 144541 1 153 5 4719.0302 1.238e-08 ***
> N 379 1 49 5 38.6145 0.001577 **
> P 17 1 19 5 4.3915 0.090257 .
> K 190 1 24 5 38.8684 0.001554 **
> N:P 43 1 24 5 8.6888 0.031971 *
> N:K 66 1 19 5 17.3195 0.008810 **
> P:K 1 1 49 5 0.0983 0.766580
> N:P:K 74 1 153 5 2.4161 0.180810
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> ## End output
>
>
> # the SS residuals are the same thou
> sum(residuals(mod.ok)^2)
> # 491.58
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Feb 19, 2009 at 12:24 AM, John Fox <jfox at mcmaster.ca> wrote:
>
>
> Dear Tal,
>
> I suppose that the "between" residuals would be obtained, for your
> example,
> by residuals(mod.ok). I'm not sure what the "within" residuals are.
You
> could apply the transformation for each within-subject effect to the
> matrix
> of residuals to get residuals for that effect -- is that what you
had
> in
> mind? A list of transformations is in the element $P of the
Anova.mlm
> object.
>
> Regards,
> John
>
> ------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
> web: socserv.mcmaster.ca/jfox
>
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org]
> On
> > Behalf Of Tal Galili
> > Sent: February-18-09 4:04 PM
> > To: r-help at r-project.org
> > Subject: [R] [package-car:Anova] extracting residuals from Anova
for
> Type
> > II/III Repeated Measures ?
> >
> > Hello dear R members.
> > I have been learning the Anova syntax in order to perform an SS
type
> III
> > Anova with repeated measures designs (thank you Prof. John Fox!)
> > And another question came up: where/what are the (between/within)
> residuals
> > for my model?
> >
> >
> >
> > ############ Play code:
> >
> >
> > phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5,
> 5)),
> > levels=c("pretest", "posttest", "followup"))
> > hour <- ordered(rep(1:5, 3))
> > idata <- data.frame(phase, hour)
> > idata
> >
> > mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
> > post.1, post.2, post.3, post.4, post.5,
> > fup.1, fup.2, fup.3, fup.4, fup.5) ~
> > treatment*gender,
> > data=OBrienKaiser)
> > av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour)
> >
> >
> > summary(av.ok, multivariate=FALSE)
> >
> > ## Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> > ##
> > ## SS num Df Error SS den Df
F
> > Pr(>F)
> > ## treatment 211.286 2 228.056 10
4.6323
> > 0.037687
> > ## gender 58.286 1 228.056 10
2.5558
> > 0.140974
> > ## treatment:gender 130.241 2 228.056 10
2.8555
> > 0.104469
> > ## phase 167.500 2 80.278 20
20.8651
> > 1.274e-05
> > ## treatment:phase 78.668 4 80.278 20
4.8997
> > 0.006426
> > ## gender:phase 1.668 2 80.278 20
0.2078
> > 0.814130
> > ## treatment:gender:phase 10.221 4 80.278 20
0.6366
> > 0.642369
> > ## hour 106.292 4 62.500 40
17.0067
> > 3.191e-08
> > ## treatment:hour 1.161 8 62.500 40
0.0929
> > 0.999257
> > ## gender:hour 2.559 4 62.500 40
0.4094
> > 0.800772
> > ## treatment:gender:hour 7.755 8 62.500 40
0.6204
> > 0.755484
> > ## phase:hour 11.083 8 96.167 80
1.1525
> > 0.338317
> > ## treatment:phase:hour 6.262 16 96.167 80
0.3256
> > 0.992814
> > ## gender:phase:hour 6.636 8 96.167 80
0.6900
> > 0.699124
> > ## treatment:gender:phase:hour 14.155 16 96.167 80
0.7359
> > 0.749562
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > ----------------------------------------------
> >
> >
> > My contact information:
> > Tal Galili
> > Phone number: 972-50-3373767
> > FaceBook: Tal Galili
> > My Blogs:
> > www.talgalili.com
> > www.biostatistics.co.il
> >
>
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
>
>
> --
> ----------------------------------------------
>
>
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> www.talgalili.com
> www.biostatistics.co.il
>
>
More information about the R-help
mailing list