# [R] two way ANOVA with unequal sample sizes

John Fox jfox at mcmaster.ca
Wed Oct 17 01:42:15 CEST 2001

```Dear Julien,

At 09:33 PM 16/10/2001 +0200, julien claude wrote:
>Hi,
>
>I am trying a two way anova with unequal sample sizes but results are not
>as expected:
>
>I take the example from Applied Linear Statistical Models (Neter et al.
>pp889-897, 1996)
>
>growth rate             gender  bone development
>1.4                     1               1
>2.4                     1               1
>2.2                     1               1
>2.4                     1               2
>2.1                     2               1
>1.7                     2               1
>2.5                     2               2
>1.8                     2               2
>2                       2               2
>0.7                     3               1
>1.1                     3               1
>0.5                     3               2
>0.9                     3               2
>1.3                     3               2

You've apparently reversed the data columns for gender and bone development.

>expected results are
>
>source of variation     SS      df      MS      F
>gender                  0.12    1       0.12    0.74
>bone development        4.1897  2       2.0949  12.89**
>interaction             0.0754  2       0.377   0.23
>Error                   1.3     8       0.1625
>
># I use
>aov (growrate ~ gender * bonedevelopment)->m
>summary(m)
>
>                                                   Df      Sum Sq    Mean
> Sq     F value   Pr(>F)
>as.factor(gender)                                       2       4.3063
>      2.1531          13.2501
>0.002891 **
>as.factor(bonedevlopment)                       1       0.0926
>0.0926                  0.5697
>0.472022
>as.factor(gender:bonedevlopment)                2       0.0754  0.0377
>      0.2321  0.798034
>Residuals                                               8       1.3000
>0.1625
>
>#if I change the order of factors, results are different
>aov (growrate ~ bonedevelopment * gender)->m
>summary(m)
>
>                                                 Df      Sum Sq Mean Sq  F
> value
>Pr(>F)
>as.factor(bonedevlopment)               1       0.0029  0.0029          0.0176
>0.897785
>as.factor(gender)                               2       4.3960  2.1980
>      13.5262 0.002713 **
>as.factor(gender:bonedevlopment)        2       0.0754  0.0377
>      0.2321   0.798034
>Residuals                                       8       1.3000  0.1625
>
>#In the both cases, results for main effects differ from those expected in
>Neter et al.
>However interaction and residuals are well estimated.
>Can anyone help, either I am wrong in the formula, or either is there an
>other problem? Is there a mean to conduct easily the  test as in it is in
>Neter et al. ?
>The same problems occurs with anova(lm(....))?

The problem is that the analysis in Neter et al. uses what are sometimes
called "type III" sums of squares -- that is, testing each term in the
model 'after' all others (including main effects 'after' interactions to
which the main effects are marginal). Some would argue that this *never*
makes sense, but it *certainly* doesn't make sense if you use
"contr.treatment" to code contrasts for factors, which is the default in R.

To get the results in Neter, you can use contr.sum or contr.helmert with
lm, along with the Anova function in the car package:

> library(car)
>
> Anova(lm(growth.rate  ~ gender * bone,
+     contrasts=list(gender='contr.sum', bone='contr.sum')),
+     type='III')
Anova Table (Type III tests)

Response: growth.rate
Sum Sq Df  F value    Pr(>F)
(Intercept) 34.680  1 213.4154 4.729e-07 ***
gender       0.120  1   0.7385  0.415160
bone         4.190  2  12.8914  0.003145 **
gender:bone  0.075  2   0.2321  0.798034
Residuals    1.300  8
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Anova(lm(growth.rate  ~ gender * bone,
+     contrasts=list(gender='contr.helmert', bone='contr.helmert')),
+     type='III')
Anova Table (Type III tests)

Response: growth.rate
Sum Sq Df  F value    Pr(>F)
(Intercept) 34.680  1 213.4154 4.729e-07 ***
gender       0.120  1   0.7385  0.415160
bone         4.190  2  12.8914  0.003145 **
gender:bone  0.075  2   0.2321  0.798034
Residuals    1.300  8
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>

You might think about whether you really prefer this analysis to one that
obeys marginality (producing what are sometimes called "type II" sums of
squares). The anova function in R computes so-called "sequential" (or "type
I") sums of squares which rarely correspond to hypotheses of interest.

How to calculate F-tests in unbalanced Anova models with interactions is a
subject that seems to produce a lot of heat, so you can expect conflicting

I hope that this is useful to you.
John

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```