[R] Anova and unbalanced designs
Nils Skotara
Nils.Skotara at uni-hamburg.de
Sat Jan 24 23:19:01 CET 2009
Dear John,
thank you again! You replicated the type III result I got in SPSS! When I
calculate Anova() type II:
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
I see the exact same values as you had written.
However, and now I am really lost, type III (I did not change anything else)
leads to the following:
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 72.000 1 9.000 8 64.0000 4.367e-05 ***
between 4.800 1 9.000 8 4.2667 0.07273 .
as.factor(within) 2.000 1 10.667 8 1.5000 0.25551
between:as.factor(within) 2.133 1 10.667 8 1.6000 0.24150
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
How is this possible?
Best regards!
Nils
Zitat von John Fox <jfox at mcmaster.ca>:
> Dear Nils,
>
> I don't currently have a copy of SAS on my computer, so I asked Michael
> Friendly to run the problem in SAS and he kindly supplied the following
> results:
>
> ----------- snip ------------
>
> The SAS System
> 1
> 12:32 Saturday, January 24,
> 2009
>
> The GLM Procedure
>
> Class Level Information
>
> Class Levels Values
>
> between 2 1 2
>
>
> Number of Observations Read 10
> Number of Observations Used 10
> The SAS System
> 2
> 12:32 Saturday, January 24,
> 2009
>
> The GLM Procedure
> Repeated Measures Analysis of Variance
>
> Repeated Measures Level Information
>
> Dependent Variable w1 w2
>
> Level of within 1 2
>
>
> MANOVA Test Criteria and Exact F Statistics
> for the Hypothesis of no within Effect
> H = Type III SSCP Matrix for within
> E = Error SSCP Matrix
>
> S=1 M=-0.5 N=3
>
> Statistic Value F Value Num DF Den DF Pr
> > F
>
> Wilks' Lambda 0.95238095 0.40 1 8
> 0.5447
> Pillai's Trace 0.04761905 0.40 1 8
> 0.5447
> Hotelling-Lawley Trace 0.05000000 0.40 1 8
> 0.5447
> Roy's Greatest Root 0.05000000 0.40 1 8
> 0.5447
>
>
> MANOVA Test Criteria and Exact F Statistics for
> the Hypothesis of no within*between Effect
> H = Type III SSCP Matrix for within*between
> E = Error SSCP Matrix
>
> S=1 M=-0.5 N=3
>
> Statistic Value F Value Num DF Den DF Pr
> > F
>
> Wilks' Lambda 0.83333333 1.60 1 8
> 0.2415
> Pillai's Trace 0.16666667 1.60 1 8
> 0.2415
> Hotelling-Lawley Trace 0.20000000 1.60 1 8
> 0.2415
> Roy's Greatest Root 0.20000000 1.60 1 8
> 0.2415
> The SAS System
> 3
> 12:32 Saturday, January 24,
> 2009
>
> The GLM Procedure
> Repeated Measures Analysis of Variance
> Tests of Hypotheses for Between Subjects Effects
>
> Source DF Type III SS Mean Square F Value Pr
> > F
>
> between 1 4.80000000 4.80000000 4.27
> 0.0727
> Error 8 9.00000000 1.12500000
> The SAS System
> 4
> 12:32 Saturday, January 24,
> 2009
>
> The GLM Procedure
> Repeated Measures Analysis of Variance
> Univariate Tests of Hypotheses for Within Subject Effects
>
> Source DF Type III SS Mean Square F Value Pr
> > F
>
> within 1 0.53333333 0.53333333 0.40
> 0.5447
> within*between 1 2.13333333 2.13333333 1.60
> 0.2415
> Error(within) 8 10.66666667 1.33333333
>
> ----------- snip ------------
>
> As you can see, these agree with Anova():
>
> ----------- snip ------------
>
> Type III Repeated Measures MANOVA Tests: Pillai test statistic
> Df test stat approx F num Df den Df Pr(>F)
> (Intercept) 1 0.963 209.067 1 8 5.121e-07 ***
> between 1 0.348 4.267 1 8 0.07273 .
> within 1 0.048 0.400 1 8 0.54474
> between:within 1 0.167 1.600 1 8 0.24150
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
>
> SS num Df Error SS den Df F Pr(>F)
> (Intercept) 235.200 1 9.000 8 209.0667 5.121e-07 ***
> between 4.800 1 9.000 8 4.2667 0.07273 .
> within 0.533 1 10.667 8 0.4000 0.54474
> between:within 2.133 1 10.667 8 1.6000 0.24150
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> ----------- snip ------------
>
> So, unless Anova() and SAS are making the same error, I guess SPSS is doing
> something strange (or perhaps you didn't do what you intended in SPSS). As I
> said before, this problem is so simple, that I find it hard to understand
> where there's room for error, but I wanted to check against SAS to test my
> sanity (a procedure that will likely get a rise out of some list members).
>
> Maybe you should send a message to the SPSS help list.
>
> Regards,
> John
>
> ------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
> web: socserv.mcmaster.ca/jfox
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On
> > Behalf Of Skotara
> > Sent: January-24-09 6:30 AM
> > To: John Fox
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Anova and unbalanced designs
> >
> > Dear John,
> >
> > thank you for your answer. You are right, I also would not have expected
> > a divergent result.
> > I have double-checked it again. No, I got type-III tests.
> > When I use type II, I get the same results in SPSS as in 'Anova' (using
> > also type-II tests).
> > My guess was that the somehow weighted means SPSS shows could be
> > responsible for this difference.
> > Or that using 'Anova' would not be correct for unequal group n's, which
> > was not the case I think.
> > Do you have any further ideas?
> >
> > Thank you!
> > Nils
> >
> > John Fox schrieb:
> > > Dear Nils,
> > >
> > > This is a pretty simple design, and I wouldn't have thought that there
> was
> > > much room for getting different results. More generally, but not here
> > (since
> > > there's only one between-subject factor), one shouldn't use
> > > contr.treatment() with "type-III" tests, as you did. Is it possible that
> > you
> > > got "type-II" tests from SPSS:
> > >
> > > ------ snip ----------
> > >
> > >
> > >> summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II"
> ))
> > >>
> > >
> > > Type II Repeated Measures MANOVA Tests:
> > >
> > > ------------------------------------------
> > >
> > > Term: between
> > >
> > > Response transformation matrix:
> > > (Intercept)
> > > w1 1
> > > w2 1
> > >
> > > Sum of squares and products for the hypothesis:
> > > (Intercept)
> > > (Intercept) 9.6
> > >
> > > Sum of squares and products for error:
> > > (Intercept)
> > > (Intercept) 18
> > >
> > > Multivariate Tests: between
> > > Df test stat approx F num Df den Df Pr(>F)
> > > Pillai 1 0.347826 4.266667 1 8 0.072726 .
> > > Wilks 1 0.652174 4.266667 1 8 0.072726 .
> > > Hotelling-Lawley 1 0.533333 4.266667 1 8 0.072726 .
> > > Roy 1 0.533333 4.266667 1 8 0.072726 .
> > > ---
> > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > ------------------------------------------
> > >
> > > Term: within
> > >
> > > Response transformation matrix:
> > > within1
> > > w1 1
> > > w2 -1
> > >
> > > Sum of squares and products for the hypothesis:
> > > within1
> > > within1 0.4
> > >
> > > Sum of squares and products for error:
> > > within1
> > > within1 21.33333
> > >
> > > Multivariate Tests: within
> > > Df test stat approx F num Df den Df Pr(>F)
> > > Pillai 1 0.0184049 0.1500000 1 8 0.70864
> > > Wilks 1 0.9815951 0.1500000 1 8 0.70864
> > > Hotelling-Lawley 1 0.0187500 0.1500000 1 8 0.70864
> > > Roy 1 0.0187500 0.1500000 1 8 0.70864
> > >
> > > ------------------------------------------
> > >
> > > Term: between:within
> > >
> > > Response transformation matrix:
> > > within1
> > > w1 1
> > > w2 -1
> > >
> > > Sum of squares and products for the hypothesis:
> > > within1
> > > within1 4.266667
> > >
> > > Sum of squares and products for error:
> > > within1
> > > within1 21.33333
> > >
> > > Multivariate Tests: between:within
> > > Df test stat approx F num Df den Df Pr(>F)
> > > Pillai 1 0.1666667 1.6000000 1 8 0.24150
> > > Wilks 1 0.8333333 1.6000000 1 8 0.24150
> > > Hotelling-Lawley 1 0.2000000 1.6000000 1 8 0.24150
> > > Roy 1 0.2000000 1.6000000 1 8 0.24150
> > >
> > > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> > >
> > > SS num Df Error SS den Df F Pr(>F)
> > > between 4.8000 1 9.0000 8 4.2667 0.07273 .
> > > within 0.2000 1 10.6667 8 0.1500 0.70864
> > > between:within 2.1333 1 10.6667 8 1.6000 0.24150
> > > ---
> > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > ------ snip ----------
> > >
> > > I hope this helps,
> > > John
> > >
> > > ------------------------------
> > > John Fox, Professor
> > > Department of Sociology
> > > McMaster University
> > > Hamilton, Ontario, Canada
> > > web: socserv.mcmaster.ca/jfox
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org]
> > >>
> > > On
> > >
> > >> Behalf Of Skotara
> > >> Sent: January-23-09 12:16 PM
> > >> To: r-help at r-project.org
> > >> Subject: [R] Anova and unbalanced designs
> > >>
> > >> Dear R-list!
> > >>
> > >> My question is related to an Anova including within and between subject
> > >> factors and unequal group sizes.
> > >> Here is a minimal example of what I did:
> > >>
> > >> library(car)
> > >> within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4)
> > >> values <- data.frame(w1 = within1, w2 = within2)
> > >> values <- as.matrix(values)
> > >> between <- factor(c(rep(1,4), rep(2,6)))
> > >> betweenanova <- lm(values ~ between)
> > >> with <- expand.grid(within = factor(1:2))
> > >> withinanova <- Anova(betweenanova, idata=with, idesign=
> > >> ~as.factor(within), type = "III" )
> > >>
> > >> I do not know if this is the appropriate method to deal with unbalanced
> > >> designs.
> > >>
> > >> I observed, that SPSS calculates everything identically except the main
> > >> effect of the within factor, here, the SSQ and F-value are very
> different
> > >> If selecting the option "show means", the means for the levels of the
> > >> within factor in SPSS are the same as:
> > >> mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and
> > >> mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))).
> > >> In other words, they are calculated as if both groups would have the
> > >> same size.
> > >>
> > >> I wonder if this is a good solution and if so, how could I do the same
> > >> thing in R?
> > >> However, I think if this is treated in SPSS as if the group sizes are
> > >> identical,
> > >> then why not the interaction, which yields to the same result as using
> > >> Anova()?
> > >>
> > >> Many thanks in advance for your time and help!
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >>
> > > http://www.R-project.org/posting-guide.html
> > >
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
More information about the R-help
mailing list