[R] Anova and unbalanced designs

Sat Jan 24 23:19:01 CET 2009

Dear John, 

thank you again! You replicated the type III result I got in SPSS! When I
calculate Anova() type II:

Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

                    SS num Df Error SS den Df      F  Pr(>F)  
between         4.8000      1   9.0000      8 4.2667 0.07273 .
within          0.2000      1  10.6667      8 0.1500 0.70864  
between:within  2.1333      1  10.6667      8 1.6000 0.24150  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

I see the exact same values as you had written. 
However, and now I am really lost, type III (I did not change anything else)
leads to the following: 

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

                              SS num Df Error SS den Df       F    Pr(>F)    
(Intercept)               72.000      1    9.000      8 64.0000 4.367e-05 ***
between                    4.800      1    9.000      8  4.2667   0.07273 .  
as.factor(within)          2.000      1   10.667      8  1.5000   0.25551    
between:as.factor(within)  2.133      1   10.667      8  1.6000   0.24150    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

How is this possible? 

Best regards!
Nils

Zitat von John Fox <jfox at mcmaster.ca>:

> Dear Nils,
> 
> I don't currently have a copy of SAS on my computer, so I asked Michael
> Friendly to run the problem in SAS and he kindly supplied the following
> results:
> 
> ----------- snip ------------
> 
>                                  The SAS System
> 1
>                                                 12:32 Saturday, January 24,
> 2009
> 
>                                The GLM Procedure
> 
>                             Class Level Information
> 
>                          Class         Levels    Values
> 
>                          between            2    1 2
> 
> 
>                     Number of Observations Read          10
>                     Number of Observations Used          10
>                                  The SAS System
> 2
>                                                 12:32 Saturday, January 24,
> 2009
> 
>                                The GLM Procedure
>                      Repeated Measures Analysis of Variance
> 
>                       Repeated Measures Level Information
> 
>                     Dependent Variable          w1       w2
> 
>                        Level of within           1        2
> 
> 
>                   MANOVA Test Criteria and Exact F Statistics
>                      for the Hypothesis of no within Effect
>                       H = Type III SSCP Matrix for within
>                              E = Error SSCP Matrix
> 
>                               S=1    M=-0.5    N=3
> 
> Statistic                        Value    F Value    Num DF    Den DF    Pr
> > F
> 
> Wilks' Lambda               0.95238095       0.40         1         8
> 0.5447
> Pillai's Trace              0.04761905       0.40         1         8
> 0.5447
> Hotelling-Lawley Trace      0.05000000       0.40         1         8
> 0.5447
> Roy's Greatest Root         0.05000000       0.40         1         8
> 0.5447
> 
> 
>                 MANOVA Test Criteria and Exact F Statistics for
>                    the Hypothesis of no within*between Effect
>                   H = Type III SSCP Matrix for within*between
>                              E = Error SSCP Matrix
> 
>                               S=1    M=-0.5    N=3
> 
> Statistic                        Value    F Value    Num DF    Den DF    Pr
> > F
> 
> Wilks' Lambda               0.83333333       1.60         1         8
> 0.2415
> Pillai's Trace              0.16666667       1.60         1         8
> 0.2415
> Hotelling-Lawley Trace      0.20000000       1.60         1         8
> 0.2415
> Roy's Greatest Root         0.20000000       1.60         1         8
> 0.2415
>                                  The SAS System
> 3
>                                                 12:32 Saturday, January 24,
> 2009
> 
>                                The GLM Procedure
>                      Repeated Measures Analysis of Variance
>                 Tests of Hypotheses for Between Subjects Effects
> 
>  Source                     DF    Type III SS    Mean Square   F Value   Pr
> > F
> 
>  between                     1     4.80000000     4.80000000      4.27
> 0.0727
>  Error                       8     9.00000000     1.12500000
>                                  The SAS System
> 4
>                                                 12:32 Saturday, January 24,
> 2009
> 
>                                The GLM Procedure
>                      Repeated Measures Analysis of Variance
>            Univariate Tests of Hypotheses for Within Subject Effects
> 
>  Source                     DF    Type III SS    Mean Square   F Value   Pr
> > F
> 
>  within                      1     0.53333333     0.53333333      0.40
> 0.5447
>  within*between              1     2.13333333     2.13333333      1.60
> 0.2415
>  Error(within)               8    10.66666667     1.33333333
> 
> ----------- snip ------------
> 
> As you can see, these agree with Anova():
> 
> ----------- snip ------------
> 
> Type III Repeated Measures MANOVA Tests: Pillai test statistic
>                Df test stat approx F num Df den Df    Pr(>F)
> (Intercept)     1     0.963  209.067      1      8 5.121e-07 ***
> between         1     0.348    4.267      1      8   0.07273 .
> within          1     0.048    0.400      1      8   0.54474
> between:within  1     0.167    1.600      1      8   0.24150
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> 
> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> 
>                     SS num Df Error SS den Df        F    Pr(>F)
> (Intercept)    235.200      1    9.000      8 209.0667 5.121e-07 ***
> between          4.800      1    9.000      8   4.2667   0.07273 .
> within           0.533      1   10.667      8   0.4000   0.54474
> between:within   2.133      1   10.667      8   1.6000   0.24150
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> ----------- snip ------------
> 
> So, unless Anova() and SAS are making the same error, I guess SPSS is doing
> something strange (or perhaps you didn't do what you intended in SPSS). As I
> said before, this problem is so simple, that I find it hard to understand
> where there's room for error, but I wanted to check against SAS to test my
> sanity (a procedure that will likely get a rise out of some list members).
> 
> Maybe you should send a message to the SPSS help list.
> 
> Regards,
>  John
> 
> ------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
> web: socserv.mcmaster.ca/jfox
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On
> > Behalf Of Skotara
> > Sent: January-24-09 6:30 AM
> > To: John Fox
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Anova and unbalanced designs
> >
> > Dear John,
> >
> > thank you for your answer. You are right, I also would not have expected
> > a divergent result.
> > I have double-checked it again. No, I got type-III tests.
> > When I use type II, I get the same results in SPSS as in 'Anova' (using
> > also type-II tests).
> > My guess was that the somehow weighted means SPSS shows could be
> > responsible for this difference.
> > Or that using 'Anova' would not be correct for unequal group n's, which
> > was not the case I think.
> > Do you have any further ideas?
> >
> > Thank you!
> > Nils
> >
> > John Fox schrieb:
> > > Dear Nils,
> > >
> > > This is a pretty simple design, and I wouldn't have thought that there
> was
> > > much room for getting different results. More generally, but not here
> > (since
> > > there's only one between-subject factor), one shouldn't use
> > > contr.treatment() with "type-III" tests, as you did. Is it possible that
> > you
> > > got "type-II" tests from SPSS:
> > >
> > > ------ snip ----------
> > >
> > >
> > >> summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II"
> ))
> > >>
> > >
> > > Type II Repeated Measures MANOVA Tests:
> > >
> > > ------------------------------------------
> > >
> > > Term: between
> > >
> > >  Response transformation matrix:
> > >    (Intercept)
> > > w1           1
> > > w2           1
> > >
> > > Sum of squares and products for the hypothesis:
> > >             (Intercept)
> > > (Intercept)         9.6
> > >
> > > Sum of squares and products for error:
> > >             (Intercept)
> > > (Intercept)          18
> > >
> > > Multivariate Tests: between
> > >                  Df test stat approx F num Df den Df   Pr(>F)
> > > Pillai            1  0.347826 4.266667      1      8 0.072726 .
> > > Wilks             1  0.652174 4.266667      1      8 0.072726 .
> > > Hotelling-Lawley  1  0.533333 4.266667      1      8 0.072726 .
> > > Roy               1  0.533333 4.266667      1      8 0.072726 .
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > ------------------------------------------
> > >
> > > Term: within
> > >
> > >  Response transformation matrix:
> > >    within1
> > > w1       1
> > > w2      -1
> > >
> > > Sum of squares and products for the hypothesis:
> > >         within1
> > > within1     0.4
> > >
> > > Sum of squares and products for error:
> > >          within1
> > > within1 21.33333
> > >
> > > Multivariate Tests: within
> > >                  Df test stat  approx F num Df den Df  Pr(>F)
> > > Pillai            1 0.0184049 0.1500000      1      8 0.70864
> > > Wilks             1 0.9815951 0.1500000      1      8 0.70864
> > > Hotelling-Lawley  1 0.0187500 0.1500000      1      8 0.70864
> > > Roy               1 0.0187500 0.1500000      1      8 0.70864
> > >
> > > ------------------------------------------
> > >
> > > Term: between:within
> > >
> > >  Response transformation matrix:
> > >    within1
> > > w1       1
> > > w2      -1
> > >
> > > Sum of squares and products for the hypothesis:
> > >          within1
> > > within1 4.266667
> > >
> > > Sum of squares and products for error:
> > >          within1
> > > within1 21.33333
> > >
> > > Multivariate Tests: between:within
> > >                  Df test stat  approx F num Df den Df  Pr(>F)
> > > Pillai            1 0.1666667 1.6000000      1      8 0.24150
> > > Wilks             1 0.8333333 1.6000000      1      8 0.24150
> > > Hotelling-Lawley  1 0.2000000 1.6000000      1      8 0.24150
> > > Roy               1 0.2000000 1.6000000      1      8 0.24150
> > >
> > > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> > >
> > >                     SS num Df Error SS den Df      F  Pr(>F)
> > > between         4.8000      1   9.0000      8 4.2667 0.07273 .
> > > within          0.2000      1  10.6667      8 0.1500 0.70864
> > > between:within  2.1333      1  10.6667      8 1.6000 0.24150
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > ------ snip ----------
> > >
> > > I hope this helps,
> > >  John
> > >
> > > ------------------------------
> > > John Fox, Professor
> > > Department of Sociology
> > > McMaster University
> > > Hamilton, Ontario, Canada
> > > web: socserv.mcmaster.ca/jfox
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org]
> > >>
> > > On
> > >
> > >> Behalf Of Skotara
> > >> Sent: January-23-09 12:16 PM
> > >> To: r-help at r-project.org
> > >> Subject: [R] Anova and unbalanced designs
> > >>
> > >> Dear R-list!
> > >>
> > >> My question is related to an Anova including within and between subject
> > >> factors and unequal group sizes.
> > >> Here is a minimal example of what I did:
> > >>
> > >> library(car)
> > >> within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4)
> > >> values <- data.frame(w1 = within1, w2 = within2)
> > >> values <- as.matrix(values)
> > >> between <- factor(c(rep(1,4), rep(2,6)))
> > >> betweenanova <- lm(values ~ between)
> > >> with <- expand.grid(within = factor(1:2))
> > >> withinanova <- Anova(betweenanova, idata=with, idesign=
> > >> ~as.factor(within), type = "III" )
> > >>
> > >> I do not know if this is the appropriate method to deal with unbalanced
> > >> designs.
> > >>
> > >> I observed, that SPSS calculates everything identically except the main
> > >> effect of the within factor, here, the SSQ and F-value are very
> different
> > >> If selecting the option "show means", the means for the levels of the
> > >> within factor in SPSS are the same as:
> > >> mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and
> > >> mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))).
> > >> In other words, they are calculated as if both groups would have the
> > >> same size.
> > >>
> > >> I wonder if this is a good solution and if so, how could I do the same
> > >> thing in R?
> > >> However, I think if this is treated in SPSS as if the group sizes are
> > >> identical,
> > >> then why not the interaction, which yields to the same result as using
> > >> Anova()?
> > >>
> > >> Many thanks in advance for your time and help!
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >>
> > > http://www.R-project.org/posting-guide.html
> > >
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
>