[R] [package-car:Anova] extracting residuals from Anova for Type II/III Repeated Measures ?

Sat Feb 21 00:16:07 CET 2009

Hello John, thanks for your reply and correction.
I apologies for my crude mistake in applying the aov (now I have
learned better). I hope to get a hold of "Statistical Models in S",
but I don't predict it could easily happen in the near future.

Also, I would be very happy if you could supply me with some more
directions as to how to obtain the "within" residuals (such as
reported from the aov summary), since I am not sure how to proceed
with that.

With regards,
Tal

On Sat, Feb 21, 2009 at 12:41 AM, John Fox <jfox at mcmaster.ca> wrote:
>
> Dear Tal,
>
> I didn't have time to look at all this yesterday.
>
> Since aov() doesn't do what I typically want to do, I guess I've not paid
> much attention to it recently. I can see, however, that you appear to have
> specified the error strata incorrectly, since (given your desire to compare
> to Anova) the within-block factors are nested within blocks. Something like
>
> > npk.aovE <- aov(value ~  N*P*K + Error(block/N*P*K), npk.long)
>
> should be closer to what you want, and in fact produces all of the sums of
> squares, but doesn't put all of the error terms together with the
> corresponding terms; thus, you get, e.g., the test for N but not for P and
> K, even though the SSs and error SSs for the latter are in the table. By
> permuting N, P, and K, you can get the other F tests. I suspect that this
> has to do with the sequential approach taken by aov() but someone else more
> familiar with how it works will have to fill in the details. I wonder,
> though, whether you've read the sections in Statistical Models in S and MASS
> referenced in the help file for aov.
>
> > summary(npk.aovE)
>
> Error: block
>          Df  Sum Sq Mean Sq F value Pr(>F)
> Residuals  5 153.147  30.629
>
> Error: P
>  Df Sum Sq Mean Sq
> P  1 16.803  16.803
>
> Error: K
>  Df Sum Sq Mean Sq
> K  1 190.40  190.40
>
> Error: block:N
>          Df Sum Sq Mean Sq F value   Pr(>F)
> N          1 378.56  378.56  38.614 0.001577 **
> Residuals  5  49.02    9.80
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Error: block:P
>          Df  Sum Sq Mean Sq F value Pr(>F)
> Residuals  5 19.1317  3.8263
>
> Error: block:K
>          Df  Sum Sq Mean Sq F value Pr(>F)
> Residuals  5 24.4933  4.8987
>
> Error: P:K
>    Df  Sum Sq Mean Sq
> P:K  1 0.96333 0.96333
>
> Error: block:N:P
>          Df Sum Sq Mean Sq F value  Pr(>F)
> N:P        1 42.563  42.563  8.6888 0.03197 *
> Residuals  5 24.493   4.899
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Error: block:N:K
>          Df Sum Sq Mean Sq F value  Pr(>F)
> N:K        1 66.270  66.270  17.320 0.00881 **
> Residuals  5 19.132   3.826
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Error: block:P:K
>          Df Sum Sq Mean Sq F value Pr(>F)
> Residuals  5 49.018   9.804
>
> Error: block:N:P:K
>          Df  Sum Sq Mean Sq F value Pr(>F)
> N:P:K      1  74.003  74.003  2.4161 0.1808
> Residuals  5 153.147  30.629
>
>
> John
>
> > -----Original Message-----
> > From: Tal Galili [mailto:tal.galili at gmail.com]
> > Sent: February-19-09 2:23 AM
> > To: John Fox
> > Cc: r-help at r-project.org
> > Subject: Re: [R] [package-car:Anova] extracting residuals from Anova for
> Type
> > II/III Repeated Measures ?
> >
> > Hello John, thank you for the fast reply.
> >
> > Thanks to your answer I was able to reproduce the "between" residuals by
> > simply writing:
> > sum(residuals(mod.ok)^2)
> >
> > But I must admit that how to obtain the "within" was beyond me, so some
> > advice here would be of great help.
> > (Also, it might be worth adding these residuals to the standard Anova
> output,
> > since I believe some researchers using the package could find it useful
> when
> > reporting there results)
> >
> >
> > By working with the Anova package, I stumbled into a conflict between the
> > Anova and the aov outputs.
> > The toy data used was reassembled from the (?aov) example (and filled in
> for
> > balance, and contr.sum contrasts where assigned).
> > When I compared the Anova (on SS type III) and the aov F (And p.value)
> > results - I found significant differences between them (though the SS
> where
> > identical).
> > If I understood correctly, for balanced designs both test should have
> yielded
> > the same results. any ideas on what the source of the difference might be?
> >
> > Here is the code:
> >
> >
> >
> >
> > ## get the data:
> > utils::data(npk, package="MASS")
> > library(reshape) # for data manipulation
> > colnames(npk)[5] <- "value" # so that the data set will read proparly in
> > "reshape"
> >
> > npk.wide = cast(block ~  N +P+ K, data = npk)
> >
> > # now we fill in the NA's, to get a balanced design
> > is.na(npk.wide) #it has na's. we'll fill them with that factor's
> combination
> > mean, so to keep some significance around
> > combination.mean <- apply(npk.wide, 2, function(x){mean(x,na.rm = T)})
> > for(i in c(2:9))
> > { npk.wide[is.na(npk.wide)[,i],i] = combination.mean[i-1] }
> > is.na(npk.wide) #no na's anymore
> >
> > # recreating a balanced npk for the aov
> > npk.long <- melt(npk.wide)
> >
> > head(npk.long,4)
> > #         block    value N P K
> > #X0_0_0       1 46.80000 0 0 0
> > #X0_0_0.1     2 51.43333 0 0 0
> > #X0_0_0.2     3 51.43333 0 0 0
> > #X0_0_0.3     4 51.43333 0 0 0
> > head(npk.wide,4)
> > #  block    0_0_0 0_0_1    0_1_0 0_1_1    1_0_0    1_0_1    1_1_0    1_1_1
> > #1     1 46.80000  52.0 54.33333  49.5 63.76667 57.00000 62.80000 54.36667
> > #2     2 51.43333  55.5 56.00000  50.5 59.80000 54.66667 57.93333 58.50000
> > #3     3 51.43333  55.0 62.80000  50.5 69.50000 54.66667 57.93333 55.80000
> > #4     4 51.43333  45.5 44.20000  50.5 62.00000 54.66667 57.93333 48.80000
> >
> >
> > # npk.wide$block # it is a factor - good.
> >
> >
> > # change into contr.sum
> > for(i in c(1,3:5)) npk.long[,i] <- C(npk.long[,i] , "contr.sum")
> > npk.wide[,1] <- C(npk.wide[,1] , "contr.sum")
> >
> >
> >
> > ## as a test, not particularly sensible statistically
> > npk.aovE <- aov(value~  N*P*K + Error(block), npk.long)
> > npk.aovE
> > summary(npk.aovE)
> >
> > ## output
> > Error: block
> >           Df  Sum Sq Mean Sq F value Pr(>F)
> > Residuals  5 153.147  30.629
> >
> > Error: Within
> >           Df Sum Sq Mean Sq F value    Pr(>F)
> > N          1 378.56  378.56 39.1502 3.546e-07 ***
> > P          1  16.80   16.80  1.7378   0.19599
> > K          1 190.40  190.40 19.6911 8.657e-05 ***
> > N:P        1  42.56   42.56  4.4018   0.04319 *
> > N:K        1  66.27   66.27  6.8535   0.01298 *
> > P:K        1   0.96    0.96  0.0996   0.75415
> > N:P:K      1  74.00   74.00  7.6533   0.00899 **
> > Residuals 35 338.43    9.67
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > ## END output
> >
> > # the residuals SS
> > # 338.43 + 153.147
> > # ==
> > sum(residuals(lm(value~  N*P*K , npk.long))^2)
> > #  491.58
> >
> >
> >
> > idata  <- cast(N+P+K  ~ . , data = npk.long)[,c(1:3)]
> > idata
> > for(i in 1:3) idata[,i] <- C(idata[,i] , "contr.sum")
> >
> > mod.ok <- lm(as.matrix(npk.wide[,-1]) ~  1, data = npk.wide)
> > av.ok <- Anova(mod.ok, idata=idata, idesign=~ N*P*K, type = "III")
> > summary(av.ok, multivariate=FALSE)
> >
> >
> > ## output - with different F and p.value results
> > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> >
> >                 SS num Df Error SS den Df         F    Pr(>F)
> > (Intercept) 144541      1      153      5 4719.0302 1.238e-08 ***
> > N              379      1       49      5   38.6145  0.001577 **
> > P               17      1       19      5    4.3915  0.090257 .
> > K              190      1       24      5   38.8684  0.001554 **
> > N:P             43      1       24      5    8.6888  0.031971 *
> > N:K             66      1       19      5   17.3195  0.008810 **
> > P:K              1      1       49      5    0.0983  0.766580
> > N:P:K           74      1      153      5    2.4161  0.180810
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > ## End output
> >
> >
> > # the SS residuals are the same thou
> > sum(residuals(mod.ok)^2)
> > #  491.58
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Feb 19, 2009 at 12:24 AM, John Fox <jfox at mcmaster.ca> wrote:
> >
> >
> >       Dear Tal,
> >
> >       I suppose that the "between" residuals would be obtained, for your
> > example,
> >       by residuals(mod.ok). I'm not sure what the "within" residuals are.
> You
> >       could apply the transformation for each within-subject effect to the
> > matrix
> >       of residuals to get residuals for that effect -- is that what you
> had
> > in
> >       mind? A list of transformations is in the element $P of the
> Anova.mlm
> >       object.
> >
> >       Regards,
> >        John
> >
> >       ------------------------------
> >       John Fox, Professor
> >       Department of Sociology
> >       McMaster University
> >       Hamilton, Ontario, Canada
> >       web: socserv.mcmaster.ca/jfox
> >
> >
> >
> >       > -----Original Message-----
> >       > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org]
> >       On
> >       > Behalf Of Tal Galili
> >       > Sent: February-18-09 4:04 PM
> >       > To: r-help at r-project.org
> >       > Subject: [R] [package-car:Anova] extracting residuals from Anova
> for
> > Type
> >       > II/III Repeated Measures ?
> >       >
> >       > Hello dear R members.
> >       > I have been learning the Anova syntax in order to perform an SS
> type
> > III
> >       > Anova with repeated measures designs (thank you Prof. John Fox!)
> >       > And another question came up: where/what are the (between/within)
> >       residuals
> >       > for my model?
> >       >
> >       >
> >       >
> >       > ############  Play code:
> >       >
> >       >
> >       > phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5,
> > 5)),
> >       >     levels=c("pretest", "posttest", "followup"))
> >       > hour <- ordered(rep(1:5, 3))
> >       > idata <- data.frame(phase, hour)
> >       > idata
> >       >
> >       > mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
> >       >                      post.1, post.2, post.3, post.4, post.5,
> >       >                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
> >       >  treatment*gender,
> >       >                 data=OBrienKaiser)
> >       > av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour)
> >       >
> >       >
> >       > summary(av.ok, multivariate=FALSE)
> >       >
> >       > ## Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> >       > ##
> >       > ##                                  SS num Df Error SS den Df
> F
> >       >  Pr(>F)
> >       > ## treatment                   211.286      2  228.056     10
> 4.6323
> >       >  0.037687
> >       > ## gender                       58.286      1  228.056     10
> 2.5558
> >       >  0.140974
> >       > ## treatment:gender            130.241      2  228.056     10
> 2.8555
> >       >  0.104469
> >       > ## phase                       167.500      2   80.278     20
> 20.8651
> >       > 1.274e-05
> >       > ## treatment:phase              78.668      4   80.278     20
> 4.8997
> >       >  0.006426
> >       > ## gender:phase                  1.668      2   80.278     20
> 0.2078
> >       >  0.814130
> >       > ## treatment:gender:phase       10.221      4   80.278     20
> 0.6366
> >       >  0.642369
> >       > ## hour                        106.292      4   62.500     40
> 17.0067
> >       > 3.191e-08
> >       > ## treatment:hour                1.161      8   62.500     40
> 0.0929
> >       >  0.999257
> >       > ## gender:hour                   2.559      4   62.500     40
> 0.4094
> >       >  0.800772
> >       > ## treatment:gender:hour         7.755      8   62.500     40
> 0.6204
> >       >  0.755484
> >       > ## phase:hour                   11.083      8   96.167     80
> 1.1525
> >       >  0.338317
> >       > ## treatment:phase:hour          6.262     16   96.167     80
> 0.3256
> >       >  0.992814
> >       > ## gender:phase:hour             6.636      8   96.167     80
> 0.6900
> >       >  0.699124
> >       > ## treatment:gender:phase:hour  14.155     16   96.167     80
> 0.7359
> >       >  0.749562
> >       >
> >       >
> >       >
> >       >
> >       >
> >       >
> >       >
> >       >
> >       >
> >       > --
> >       > ----------------------------------------------
> >       >
> >       >
> >       > My contact information:
> >       > Tal Galili
> >       > Phone number: 972-50-3373767
> >       > FaceBook: Tal Galili
> >       > My Blogs:
> >       > www.talgalili.com
> >       > www.biostatistics.co.il
> >       >
> >
> >       >       [[alternative HTML version deleted]]
> >       >
> >       > ______________________________________________
> >       > R-help at r-project.org mailing list
> >       > https://stat.ethz.ch/mailman/listinfo/r-help
> >       > PLEASE do read the posting guide
> >       http://www.R-project.org/posting-guide.html
> >       > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> >
> >
> >
> > --
> > ----------------------------------------------
> >
> >
> > My contact information:
> > Tal Galili
> > Phone number: 972-50-3373767
> > FaceBook: Tal Galili
> > My Blogs:
> > www.talgalili.com
> > www.biostatistics.co.il
> >
> >
>
>
>

--
----------------------------------------------

My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il