[R] Anova and unbalanced designs

Sat Jan 24 19:17:44 CET 2009

Dear Nils,

I don't currently have a copy of SAS on my computer, so I asked Michael
Friendly to run the problem in SAS and he kindly supplied the following
results:

----------- snip ------------

                                 The SAS System
1
                                                12:32 Saturday, January 24,
2009

                               The GLM Procedure

                            Class Level Information

                         Class         Levels    Values

                         between            2    1 2

                    Number of Observations Read          10
                    Number of Observations Used          10
                                 The SAS System
2
                                                12:32 Saturday, January 24,
2009

                               The GLM Procedure
                     Repeated Measures Analysis of Variance

                      Repeated Measures Level Information

                    Dependent Variable          w1       w2

                       Level of within           1        2

                  MANOVA Test Criteria and Exact F Statistics
                     for the Hypothesis of no within Effect
                      H = Type III SSCP Matrix for within
                             E = Error SSCP Matrix

                              S=1    M=-0.5    N=3

Statistic                        Value    F Value    Num DF    Den DF    Pr
> F

Wilks' Lambda               0.95238095       0.40         1         8
0.5447
Pillai's Trace              0.04761905       0.40         1         8
0.5447
Hotelling-Lawley Trace      0.05000000       0.40         1         8
0.5447
Roy's Greatest Root         0.05000000       0.40         1         8
0.5447

                MANOVA Test Criteria and Exact F Statistics for
                   the Hypothesis of no within*between Effect
                  H = Type III SSCP Matrix for within*between
                             E = Error SSCP Matrix

                              S=1    M=-0.5    N=3

Statistic                        Value    F Value    Num DF    Den DF    Pr
> F

Wilks' Lambda               0.83333333       1.60         1         8
0.2415
Pillai's Trace              0.16666667       1.60         1         8
0.2415
Hotelling-Lawley Trace      0.20000000       1.60         1         8
0.2415
Roy's Greatest Root         0.20000000       1.60         1         8
0.2415
                                 The SAS System
3
                                                12:32 Saturday, January 24,
2009

                               The GLM Procedure
                     Repeated Measures Analysis of Variance
                Tests of Hypotheses for Between Subjects Effects

 Source                     DF    Type III SS    Mean Square   F Value   Pr
> F

 between                     1     4.80000000     4.80000000      4.27
0.0727
 Error                       8     9.00000000     1.12500000
                                 The SAS System
4
                                                12:32 Saturday, January 24,
2009

                               The GLM Procedure
                     Repeated Measures Analysis of Variance
           Univariate Tests of Hypotheses for Within Subject Effects

 Source                     DF    Type III SS    Mean Square   F Value   Pr
> F

 within                      1     0.53333333     0.53333333      0.40
0.5447
 within*between              1     2.13333333     2.13333333      1.60
0.2415
 Error(within)               8    10.66666667     1.33333333

----------- snip ------------

As you can see, these agree with Anova():

----------- snip ------------

Type III Repeated Measures MANOVA Tests: Pillai test statistic
               Df test stat approx F num Df den Df    Pr(>F)    
(Intercept)     1     0.963  209.067      1      8 5.121e-07 ***
between         1     0.348    4.267      1      8   0.07273 .  
within          1     0.048    0.400      1      8   0.54474    
between:within  1     0.167    1.600      1      8   0.24150    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

                    SS num Df Error SS den Df        F    Pr(>F)    
(Intercept)    235.200      1    9.000      8 209.0667 5.121e-07 ***
between          4.800      1    9.000      8   4.2667   0.07273 .  
within           0.533      1   10.667      8   0.4000   0.54474    
between:within   2.133      1   10.667      8   1.6000   0.24150    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

----------- snip ------------

So, unless Anova() and SAS are making the same error, I guess SPSS is doing
something strange (or perhaps you didn't do what you intended in SPSS). As I
said before, this problem is so simple, that I find it hard to understand
where there's room for error, but I wanted to check against SAS to test my
sanity (a procedure that will likely get a rise out of some list members).

Maybe you should send a message to the SPSS help list.

Regards,
 John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Skotara
> Sent: January-24-09 6:30 AM
> To: John Fox
> Cc: r-help at r-project.org
> Subject: Re: [R] Anova and unbalanced designs
> 
> Dear John,
> 
> thank you for your answer. You are right, I also would not have expected
> a divergent result.
> I have double-checked it again. No, I got type-III tests.
> When I use type II, I get the same results in SPSS as in 'Anova' (using
> also type-II tests).
> My guess was that the somehow weighted means SPSS shows could be
> responsible for this difference.
> Or that using 'Anova' would not be correct for unequal group n's, which
> was not the case I think.
> Do you have any further ideas?
> 
> Thank you!
> Nils
> 
> John Fox schrieb:
> > Dear Nils,
> >
> > This is a pretty simple design, and I wouldn't have thought that there
was
> > much room for getting different results. More generally, but not here
> (since
> > there's only one between-subject factor), one shouldn't use
> > contr.treatment() with "type-III" tests, as you did. Is it possible that
> you
> > got "type-II" tests from SPSS:
> >
> > ------ snip ----------
> >
> >
> >> summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II"
))
> >>
> >
> > Type II Repeated Measures MANOVA Tests:
> >
> > ------------------------------------------
> >
> > Term: between
> >
> >  Response transformation matrix:
> >    (Intercept)
> > w1           1
> > w2           1
> >
> > Sum of squares and products for the hypothesis:
> >             (Intercept)
> > (Intercept)         9.6
> >
> > Sum of squares and products for error:
> >             (Intercept)
> > (Intercept)          18
> >
> > Multivariate Tests: between
> >                  Df test stat approx F num Df den Df   Pr(>F)
> > Pillai            1  0.347826 4.266667      1      8 0.072726 .
> > Wilks             1  0.652174 4.266667      1      8 0.072726 .
> > Hotelling-Lawley  1  0.533333 4.266667      1      8 0.072726 .
> > Roy               1  0.533333 4.266667      1      8 0.072726 .
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > ------------------------------------------
> >
> > Term: within
> >
> >  Response transformation matrix:
> >    within1
> > w1       1
> > w2      -1
> >
> > Sum of squares and products for the hypothesis:
> >         within1
> > within1     0.4
> >
> > Sum of squares and products for error:
> >          within1
> > within1 21.33333
> >
> > Multivariate Tests: within
> >                  Df test stat  approx F num Df den Df  Pr(>F)
> > Pillai            1 0.0184049 0.1500000      1      8 0.70864
> > Wilks             1 0.9815951 0.1500000      1      8 0.70864
> > Hotelling-Lawley  1 0.0187500 0.1500000      1      8 0.70864
> > Roy               1 0.0187500 0.1500000      1      8 0.70864
> >
> > ------------------------------------------
> >
> > Term: between:within
> >
> >  Response transformation matrix:
> >    within1
> > w1       1
> > w2      -1
> >
> > Sum of squares and products for the hypothesis:
> >          within1
> > within1 4.266667
> >
> > Sum of squares and products for error:
> >          within1
> > within1 21.33333
> >
> > Multivariate Tests: between:within
> >                  Df test stat  approx F num Df den Df  Pr(>F)
> > Pillai            1 0.1666667 1.6000000      1      8 0.24150
> > Wilks             1 0.8333333 1.6000000      1      8 0.24150
> > Hotelling-Lawley  1 0.2000000 1.6000000      1      8 0.24150
> > Roy               1 0.2000000 1.6000000      1      8 0.24150
> >
> > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> >
> >                     SS num Df Error SS den Df      F  Pr(>F)
> > between         4.8000      1   9.0000      8 4.2667 0.07273 .
> > within          0.2000      1  10.6667      8 0.1500 0.70864
> > between:within  2.1333      1  10.6667      8 1.6000 0.24150
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > ------ snip ----------
> >
> > I hope this helps,
> >  John
> >
> > ------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > web: socserv.mcmaster.ca/jfox
> >
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
> >>
> > On
> >
> >> Behalf Of Skotara
> >> Sent: January-23-09 12:16 PM
> >> To: r-help at r-project.org
> >> Subject: [R] Anova and unbalanced designs
> >>
> >> Dear R-list!
> >>
> >> My question is related to an Anova including within and between subject
> >> factors and unequal group sizes.
> >> Here is a minimal example of what I did:
> >>
> >> library(car)
> >> within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4)
> >> values <- data.frame(w1 = within1, w2 = within2)
> >> values <- as.matrix(values)
> >> between <- factor(c(rep(1,4), rep(2,6)))
> >> betweenanova <- lm(values ~ between)
> >> with <- expand.grid(within = factor(1:2))
> >> withinanova <- Anova(betweenanova, idata=with, idesign=
> >> ~as.factor(within), type = "III" )
> >>
> >> I do not know if this is the appropriate method to deal with unbalanced
> >> designs.
> >>
> >> I observed, that SPSS calculates everything identically except the main
> >> effect of the within factor, here, the SSQ and F-value are very
different
> >> If selecting the option "show means", the means for the levels of the
> >> within factor in SPSS are the same as:
> >> mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and
> >> mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))).
> >> In other words, they are calculated as if both groups would have the
> >> same size.
> >>
> >> I wonder if this is a good solution and if so, how could I do the same
> >> thing in R?
> >> However, I think if this is treated in SPSS as if the group sizes are
> >> identical,
> >> then why not the interaction, which yields to the same result as using
> >> Anova()?
> >>
> >> Many thanks in advance for your time and help!
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>
> > http://www.R-project.org/posting-guide.html
> >
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.