[R] Anova and unbalanced designs
John Fox
jfox at mcmaster.ca
Sat Jan 24 19:17:44 CET 2009
Dear Nils,
I don't currently have a copy of SAS on my computer, so I asked Michael
Friendly to run the problem in SAS and he kindly supplied the following
results:
----------- snip ------------
The SAS System
1
12:32 Saturday, January 24,
2009
The GLM Procedure
Class Level Information
Class Levels Values
between 2 1 2
Number of Observations Read 10
Number of Observations Used 10
The SAS System
2
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Repeated Measures Level Information
Dependent Variable w1 w2
Level of within 1 2
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of no within Effect
H = Type III SSCP Matrix for within
E = Error SSCP Matrix
S=1 M=-0.5 N=3
Statistic Value F Value Num DF Den DF Pr
> F
Wilks' Lambda 0.95238095 0.40 1 8
0.5447
Pillai's Trace 0.04761905 0.40 1 8
0.5447
Hotelling-Lawley Trace 0.05000000 0.40 1 8
0.5447
Roy's Greatest Root 0.05000000 0.40 1 8
0.5447
MANOVA Test Criteria and Exact F Statistics for
the Hypothesis of no within*between Effect
H = Type III SSCP Matrix for within*between
E = Error SSCP Matrix
S=1 M=-0.5 N=3
Statistic Value F Value Num DF Den DF Pr
> F
Wilks' Lambda 0.83333333 1.60 1 8
0.2415
Pillai's Trace 0.16666667 1.60 1 8
0.2415
Hotelling-Lawley Trace 0.20000000 1.60 1 8
0.2415
Roy's Greatest Root 0.20000000 1.60 1 8
0.2415
The SAS System
3
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects
Source DF Type III SS Mean Square F Value Pr
> F
between 1 4.80000000 4.80000000 4.27
0.0727
Error 8 9.00000000 1.12500000
The SAS System
4
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects
Source DF Type III SS Mean Square F Value Pr
> F
within 1 0.53333333 0.53333333 0.40
0.5447
within*between 1 2.13333333 2.13333333 1.60
0.2415
Error(within) 8 10.66666667 1.33333333
----------- snip ------------
As you can see, these agree with Anova():
----------- snip ------------
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.963 209.067 1 8 5.121e-07 ***
between 1 0.348 4.267 1 8 0.07273 .
within 1 0.048 0.400 1 8 0.54474
between:within 1 0.167 1.600 1 8 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 235.200 1 9.000 8 209.0667 5.121e-07 ***
between 4.800 1 9.000 8 4.2667 0.07273 .
within 0.533 1 10.667 8 0.4000 0.54474
between:within 2.133 1 10.667 8 1.6000 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
----------- snip ------------
So, unless Anova() and SAS are making the same error, I guess SPSS is doing
something strange (or perhaps you didn't do what you intended in SPSS). As I
said before, this problem is so simple, that I find it hard to understand
where there's room for error, but I wanted to check against SAS to test my
sanity (a procedure that will likely get a rise out of some list members).
Maybe you should send a message to the SPSS help list.
Regards,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Skotara
> Sent: January-24-09 6:30 AM
> To: John Fox
> Cc: r-help at r-project.org
> Subject: Re: [R] Anova and unbalanced designs
>
> Dear John,
>
> thank you for your answer. You are right, I also would not have expected
> a divergent result.
> I have double-checked it again. No, I got type-III tests.
> When I use type II, I get the same results in SPSS as in 'Anova' (using
> also type-II tests).
> My guess was that the somehow weighted means SPSS shows could be
> responsible for this difference.
> Or that using 'Anova' would not be correct for unequal group n's, which
> was not the case I think.
> Do you have any further ideas?
>
> Thank you!
> Nils
>
> John Fox schrieb:
> > Dear Nils,
> >
> > This is a pretty simple design, and I wouldn't have thought that there
was
> > much room for getting different results. More generally, but not here
> (since
> > there's only one between-subject factor), one shouldn't use
> > contr.treatment() with "type-III" tests, as you did. Is it possible that
> you
> > got "type-II" tests from SPSS:
> >
> > ------ snip ----------
> >
> >
> >> summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II"
))
> >>
> >
> > Type II Repeated Measures MANOVA Tests:
> >
> > ------------------------------------------
> >
> > Term: between
> >
> > Response transformation matrix:
> > (Intercept)
> > w1 1
> > w2 1
> >
> > Sum of squares and products for the hypothesis:
> > (Intercept)
> > (Intercept) 9.6
> >
> > Sum of squares and products for error:
> > (Intercept)
> > (Intercept) 18
> >
> > Multivariate Tests: between
> > Df test stat approx F num Df den Df Pr(>F)
> > Pillai 1 0.347826 4.266667 1 8 0.072726 .
> > Wilks 1 0.652174 4.266667 1 8 0.072726 .
> > Hotelling-Lawley 1 0.533333 4.266667 1 8 0.072726 .
> > Roy 1 0.533333 4.266667 1 8 0.072726 .
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > ------------------------------------------
> >
> > Term: within
> >
> > Response transformation matrix:
> > within1
> > w1 1
> > w2 -1
> >
> > Sum of squares and products for the hypothesis:
> > within1
> > within1 0.4
> >
> > Sum of squares and products for error:
> > within1
> > within1 21.33333
> >
> > Multivariate Tests: within
> > Df test stat approx F num Df den Df Pr(>F)
> > Pillai 1 0.0184049 0.1500000 1 8 0.70864
> > Wilks 1 0.9815951 0.1500000 1 8 0.70864
> > Hotelling-Lawley 1 0.0187500 0.1500000 1 8 0.70864
> > Roy 1 0.0187500 0.1500000 1 8 0.70864
> >
> > ------------------------------------------
> >
> > Term: between:within
> >
> > Response transformation matrix:
> > within1
> > w1 1
> > w2 -1
> >
> > Sum of squares and products for the hypothesis:
> > within1
> > within1 4.266667
> >
> > Sum of squares and products for error:
> > within1
> > within1 21.33333
> >
> > Multivariate Tests: between:within
> > Df test stat approx F num Df den Df Pr(>F)
> > Pillai 1 0.1666667 1.6000000 1 8 0.24150
> > Wilks 1 0.8333333 1.6000000 1 8 0.24150
> > Hotelling-Lawley 1 0.2000000 1.6000000 1 8 0.24150
> > Roy 1 0.2000000 1.6000000 1 8 0.24150
> >
> > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> >
> > SS num Df Error SS den Df F Pr(>F)
> > between 4.8000 1 9.0000 8 4.2667 0.07273 .
> > within 0.2000 1 10.6667 8 0.1500 0.70864
> > between:within 2.1333 1 10.6667 8 1.6000 0.24150
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > ------ snip ----------
> >
> > I hope this helps,
> > John
> >
> > ------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > web: socserv.mcmaster.ca/jfox
> >
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
> >>
> > On
> >
> >> Behalf Of Skotara
> >> Sent: January-23-09 12:16 PM
> >> To: r-help at r-project.org
> >> Subject: [R] Anova and unbalanced designs
> >>
> >> Dear R-list!
> >>
> >> My question is related to an Anova including within and between subject
> >> factors and unequal group sizes.
> >> Here is a minimal example of what I did:
> >>
> >> library(car)
> >> within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4)
> >> values <- data.frame(w1 = within1, w2 = within2)
> >> values <- as.matrix(values)
> >> between <- factor(c(rep(1,4), rep(2,6)))
> >> betweenanova <- lm(values ~ between)
> >> with <- expand.grid(within = factor(1:2))
> >> withinanova <- Anova(betweenanova, idata=with, idesign=
> >> ~as.factor(within), type = "III" )
> >>
> >> I do not know if this is the appropriate method to deal with unbalanced
> >> designs.
> >>
> >> I observed, that SPSS calculates everything identically except the main
> >> effect of the within factor, here, the SSQ and F-value are very
different
> >> If selecting the option "show means", the means for the levels of the
> >> within factor in SPSS are the same as:
> >> mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and
> >> mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))).
> >> In other words, they are calculated as if both groups would have the
> >> same size.
> >>
> >> I wonder if this is a good solution and if so, how could I do the same
> >> thing in R?
> >> However, I think if this is treated in SPSS as if the group sizes are
> >> identical,
> >> then why not the interaction, which yields to the same result as using
> >> Anova()?
> >>
> >> Many thanks in advance for your time and help!
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>
> > http://www.R-project.org/posting-guide.html
> >
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list