[R] [package-car:Anova] extracting residuals from Anova for Type II/III Repeated Measures ?
Tal Galili
tal.galili at gmail.com
Sat Feb 21 00:16:07 CET 2009
Hello John, thanks for your reply and correction.
I apologies for my crude mistake in applying the aov (now I have
learned better). I hope to get a hold of "Statistical Models in S",
but I don't predict it could easily happen in the near future.
Also, I would be very happy if you could supply me with some more
directions as to how to obtain the "within" residuals (such as
reported from the aov summary), since I am not sure how to proceed
with that.
With regards,
Tal
On Sat, Feb 21, 2009 at 12:41 AM, John Fox <jfox at mcmaster.ca> wrote:
>
> Dear Tal,
>
> I didn't have time to look at all this yesterday.
>
> Since aov() doesn't do what I typically want to do, I guess I've not paid
> much attention to it recently. I can see, however, that you appear to have
> specified the error strata incorrectly, since (given your desire to compare
> to Anova) the within-block factors are nested within blocks. Something like
>
> > npk.aovE <- aov(value ~ N*P*K + Error(block/N*P*K), npk.long)
>
> should be closer to what you want, and in fact produces all of the sums of
> squares, but doesn't put all of the error terms together with the
> corresponding terms; thus, you get, e.g., the test for N but not for P and
> K, even though the SSs and error SSs for the latter are in the table. By
> permuting N, P, and K, you can get the other F tests. I suspect that this
> has to do with the sequential approach taken by aov() but someone else more
> familiar with how it works will have to fill in the details. I wonder,
> though, whether you've read the sections in Statistical Models in S and MASS
> referenced in the help file for aov.
>
> > summary(npk.aovE)
>
> Error: block
> Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 5 153.147 30.629
>
> Error: P
> Df Sum Sq Mean Sq
> P 1 16.803 16.803
>
> Error: K
> Df Sum Sq Mean Sq
> K 1 190.40 190.40
>
> Error: block:N
> Df Sum Sq Mean Sq F value Pr(>F)
> N 1 378.56 378.56 38.614 0.001577 **
> Residuals 5 49.02 9.80
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Error: block:P
> Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 5 19.1317 3.8263
>
> Error: block:K
> Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 5 24.4933 4.8987
>
> Error: P:K
> Df Sum Sq Mean Sq
> P:K 1 0.96333 0.96333
>
> Error: block:N:P
> Df Sum Sq Mean Sq F value Pr(>F)
> N:P 1 42.563 42.563 8.6888 0.03197 *
> Residuals 5 24.493 4.899
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Error: block:N:K
> Df Sum Sq Mean Sq F value Pr(>F)
> N:K 1 66.270 66.270 17.320 0.00881 **
> Residuals 5 19.132 3.826
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Error: block:P:K
> Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 5 49.018 9.804
>
> Error: block:N:P:K
> Df Sum Sq Mean Sq F value Pr(>F)
> N:P:K 1 74.003 74.003 2.4161 0.1808
> Residuals 5 153.147 30.629
>
>
> John
>
> > -----Original Message-----
> > From: Tal Galili [mailto:tal.galili at gmail.com]
> > Sent: February-19-09 2:23 AM
> > To: John Fox
> > Cc: r-help at r-project.org
> > Subject: Re: [R] [package-car:Anova] extracting residuals from Anova for
> Type
> > II/III Repeated Measures ?
> >
> > Hello John, thank you for the fast reply.
> >
> > Thanks to your answer I was able to reproduce the "between" residuals by
> > simply writing:
> > sum(residuals(mod.ok)^2)
> >
> > But I must admit that how to obtain the "within" was beyond me, so some
> > advice here would be of great help.
> > (Also, it might be worth adding these residuals to the standard Anova
> output,
> > since I believe some researchers using the package could find it useful
> when
> > reporting there results)
> >
> >
> > By working with the Anova package, I stumbled into a conflict between the
> > Anova and the aov outputs.
> > The toy data used was reassembled from the (?aov) example (and filled in
> for
> > balance, and contr.sum contrasts where assigned).
> > When I compared the Anova (on SS type III) and the aov F (And p.value)
> > results - I found significant differences between them (though the SS
> where
> > identical).
> > If I understood correctly, for balanced designs both test should have
> yielded
> > the same results. any ideas on what the source of the difference might be?
> >
> > Here is the code:
> >
> >
> >
> >
> > ## get the data:
> > utils::data(npk, package="MASS")
> > library(reshape) # for data manipulation
> > colnames(npk)[5] <- "value" # so that the data set will read proparly in
> > "reshape"
> >
> > npk.wide = cast(block ~ N +P+ K, data = npk)
> >
> > # now we fill in the NA's, to get a balanced design
> > is.na(npk.wide) #it has na's. we'll fill them with that factor's
> combination
> > mean, so to keep some significance around
> > combination.mean <- apply(npk.wide, 2, function(x){mean(x,na.rm = T)})
> > for(i in c(2:9))
> > { npk.wide[is.na(npk.wide)[,i],i] = combination.mean[i-1] }
> > is.na(npk.wide) #no na's anymore
> >
> > # recreating a balanced npk for the aov
> > npk.long <- melt(npk.wide)
> >
> > head(npk.long,4)
> > # block value N P K
> > #X0_0_0 1 46.80000 0 0 0
> > #X0_0_0.1 2 51.43333 0 0 0
> > #X0_0_0.2 3 51.43333 0 0 0
> > #X0_0_0.3 4 51.43333 0 0 0
> > head(npk.wide,4)
> > # block 0_0_0 0_0_1 0_1_0 0_1_1 1_0_0 1_0_1 1_1_0 1_1_1
> > #1 1 46.80000 52.0 54.33333 49.5 63.76667 57.00000 62.80000 54.36667
> > #2 2 51.43333 55.5 56.00000 50.5 59.80000 54.66667 57.93333 58.50000
> > #3 3 51.43333 55.0 62.80000 50.5 69.50000 54.66667 57.93333 55.80000
> > #4 4 51.43333 45.5 44.20000 50.5 62.00000 54.66667 57.93333 48.80000
> >
> >
> > # npk.wide$block # it is a factor - good.
> >
> >
> > # change into contr.sum
> > for(i in c(1,3:5)) npk.long[,i] <- C(npk.long[,i] , "contr.sum")
> > npk.wide[,1] <- C(npk.wide[,1] , "contr.sum")
> >
> >
> >
> > ## as a test, not particularly sensible statistically
> > npk.aovE <- aov(value~ N*P*K + Error(block), npk.long)
> > npk.aovE
> > summary(npk.aovE)
> >
> > ## output
> > Error: block
> > Df Sum Sq Mean Sq F value Pr(>F)
> > Residuals 5 153.147 30.629
> >
> > Error: Within
> > Df Sum Sq Mean Sq F value Pr(>F)
> > N 1 378.56 378.56 39.1502 3.546e-07 ***
> > P 1 16.80 16.80 1.7378 0.19599
> > K 1 190.40 190.40 19.6911 8.657e-05 ***
> > N:P 1 42.56 42.56 4.4018 0.04319 *
> > N:K 1 66.27 66.27 6.8535 0.01298 *
> > P:K 1 0.96 0.96 0.0996 0.75415
> > N:P:K 1 74.00 74.00 7.6533 0.00899 **
> > Residuals 35 338.43 9.67
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > ## END output
> >
> > # the residuals SS
> > # 338.43 + 153.147
> > # ==
> > sum(residuals(lm(value~ N*P*K , npk.long))^2)
> > # 491.58
> >
> >
> >
> > idata <- cast(N+P+K ~ . , data = npk.long)[,c(1:3)]
> > idata
> > for(i in 1:3) idata[,i] <- C(idata[,i] , "contr.sum")
> >
> > mod.ok <- lm(as.matrix(npk.wide[,-1]) ~ 1, data = npk.wide)
> > av.ok <- Anova(mod.ok, idata=idata, idesign=~ N*P*K, type = "III")
> > summary(av.ok, multivariate=FALSE)
> >
> >
> > ## output - with different F and p.value results
> > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> >
> > SS num Df Error SS den Df F Pr(>F)
> > (Intercept) 144541 1 153 5 4719.0302 1.238e-08 ***
> > N 379 1 49 5 38.6145 0.001577 **
> > P 17 1 19 5 4.3915 0.090257 .
> > K 190 1 24 5 38.8684 0.001554 **
> > N:P 43 1 24 5 8.6888 0.031971 *
> > N:K 66 1 19 5 17.3195 0.008810 **
> > P:K 1 1 49 5 0.0983 0.766580
> > N:P:K 74 1 153 5 2.4161 0.180810
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > ## End output
> >
> >
> > # the SS residuals are the same thou
> > sum(residuals(mod.ok)^2)
> > # 491.58
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Feb 19, 2009 at 12:24 AM, John Fox <jfox at mcmaster.ca> wrote:
> >
> >
> > Dear Tal,
> >
> > I suppose that the "between" residuals would be obtained, for your
> > example,
> > by residuals(mod.ok). I'm not sure what the "within" residuals are.
> You
> > could apply the transformation for each within-subject effect to the
> > matrix
> > of residuals to get residuals for that effect -- is that what you
> had
> > in
> > mind? A list of transformations is in the element $P of the
> Anova.mlm
> > object.
> >
> > Regards,
> > John
> >
> > ------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > web: socserv.mcmaster.ca/jfox
> >
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org]
> > On
> > > Behalf Of Tal Galili
> > > Sent: February-18-09 4:04 PM
> > > To: r-help at r-project.org
> > > Subject: [R] [package-car:Anova] extracting residuals from Anova
> for
> > Type
> > > II/III Repeated Measures ?
> > >
> > > Hello dear R members.
> > > I have been learning the Anova syntax in order to perform an SS
> type
> > III
> > > Anova with repeated measures designs (thank you Prof. John Fox!)
> > > And another question came up: where/what are the (between/within)
> > residuals
> > > for my model?
> > >
> > >
> > >
> > > ############ Play code:
> > >
> > >
> > > phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5,
> > 5)),
> > > levels=c("pretest", "posttest", "followup"))
> > > hour <- ordered(rep(1:5, 3))
> > > idata <- data.frame(phase, hour)
> > > idata
> > >
> > > mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
> > > post.1, post.2, post.3, post.4, post.5,
> > > fup.1, fup.2, fup.3, fup.4, fup.5) ~
> > > treatment*gender,
> > > data=OBrienKaiser)
> > > av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour)
> > >
> > >
> > > summary(av.ok, multivariate=FALSE)
> > >
> > > ## Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> > > ##
> > > ## SS num Df Error SS den Df
> F
> > > Pr(>F)
> > > ## treatment 211.286 2 228.056 10
> 4.6323
> > > 0.037687
> > > ## gender 58.286 1 228.056 10
> 2.5558
> > > 0.140974
> > > ## treatment:gender 130.241 2 228.056 10
> 2.8555
> > > 0.104469
> > > ## phase 167.500 2 80.278 20
> 20.8651
> > > 1.274e-05
> > > ## treatment:phase 78.668 4 80.278 20
> 4.8997
> > > 0.006426
> > > ## gender:phase 1.668 2 80.278 20
> 0.2078
> > > 0.814130
> > > ## treatment:gender:phase 10.221 4 80.278 20
> 0.6366
> > > 0.642369
> > > ## hour 106.292 4 62.500 40
> 17.0067
> > > 3.191e-08
> > > ## treatment:hour 1.161 8 62.500 40
> 0.0929
> > > 0.999257
> > > ## gender:hour 2.559 4 62.500 40
> 0.4094
> > > 0.800772
> > > ## treatment:gender:hour 7.755 8 62.500 40
> 0.6204
> > > 0.755484
> > > ## phase:hour 11.083 8 96.167 80
> 1.1525
> > > 0.338317
> > > ## treatment:phase:hour 6.262 16 96.167 80
> 0.3256
> > > 0.992814
> > > ## gender:phase:hour 6.636 8 96.167 80
> 0.6900
> > > 0.699124
> > > ## treatment:gender:phase:hour 14.155 16 96.167 80
> 0.7359
> > > 0.749562
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > ----------------------------------------------
> > >
> > >
> > > My contact information:
> > > Tal Galili
> > > Phone number: 972-50-3373767
> > > FaceBook: Tal Galili
> > > My Blogs:
> > > www.talgalili.com
> > > www.biostatistics.co.il
> > >
> >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> >
> >
> >
> > --
> > ----------------------------------------------
> >
> >
> > My contact information:
> > Tal Galili
> > Phone number: 972-50-3373767
> > FaceBook: Tal Galili
> > My Blogs:
> > www.talgalili.com
> > www.biostatistics.co.il
> >
> >
>
>
>
--
----------------------------------------------
My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il
More information about the R-help
mailing list