[R] Sums of sq in car package Anova function
John Fox
jfox at mcmaster.ca
Sun Dec 19 18:25:32 CET 2004
Dear Karla,
If indeed one of your factors has levels "0" and "1", that wouldn't matter
at all, but if it is a numeric variable with values 0 and 1 (rather than a
factor) then that would make a difference to the linear model that's fit to
the data. The difference doesn't affect the sequential ("type-I") sums of
squares produced by anova() but it does affect some of the type-III sums of
squares produced by Anova().
Anyway, I'm glad that you found the error.
Regards,
John
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Karla Sartor
> Sent: Sunday, December 19, 2004 11:32 AM
> To: John Fox
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Sums of sq in car package Anova function
>
> John,
>
> Thank very much for your help. I think that I have figured
> out my problem. The levels of one of my factors are "1" and
> "0". While this didn't matter with the 'anova()' function,
> is does seem to alter the results with the 'Anova' function.
> When I changed the levels to letters, the tables matched my
> SPSS output. As for why the type III test in SPSS was nearly
> identical to the 'anova' function, my unequal sample sizes
> were not drastically different so changing to type III must
> not have changed the results very much? That was all I could
> come up with at the time.
>
> Here is the code I used:
>
> options(contrasts = c("contr.sum", "contr.poly"))
> require(car)
>
> GH = read.table("GH.txt", header =T)
> GH.sub = subset(GH, GH$sp=="C")
> attach(GH.sub)
>
> biomass= log10(GH.sub$tot.bio)
> GH.sub.fit = lm(biomass~am*nbr*barr, data=GH.sub)
> print(Anova(GH.sub.fit, type='III'))
>
> I get this with "1" and "0" factor levels:
>
> Anova Table (Type III tests)
>
> Response: biomass
> Sum Sq Df F value Pr(>F)
> (Intercept) 51.943 1 3725.4324 < 2.2e-16 ***
> am 2.403 1 172.3630 < 2.2e-16 ***
> nbr 0.779 3 18.6347 4.434e-10 ***
> barr 0.078 1 5.5803 0.01968 *
> am:nbr 0.018 3 0.4284 0.73296
> am:barr 0.039 1 2.7826 0.09775 .
> nbr:barr 0.044 3 1.0606 0.36834
> am:nbr:barr 0.022 3 0.5208 0.66873
> Residuals 1.771 127
>
>
> And this with letter factor levels:
>
> Anova Table (Type III tests)
>
> Response: biomass
> Sum Sq Df F value Pr(>F)
> (Intercept) 75.371 1 5405.7202 < 2e-16 ***
> am 2.403 1 172.3630 < 2e-16 ***
> nbr 1.482 3 35.4357 < 2e-16 ***
> barr 0.040 1 2.8410 0.09434 .
> am:nbr 0.018 3 0.4284 0.73296
> am:barr 0.039 1 2.7826 0.09775 .
> nbr:barr 0.051 3 1.2167 0.30643
> am:nbr:barr 0.022 3 0.5208 0.66873
> Residuals 1.771 127
> ---
>
> SPSS gives:
>
> Tests of Between-Subjects Effects
> Dependent Variable: lot10.tot.bio
> Source Type III df Mean
> Square
> F Sig.
> Sum of Squares
> Corrected Model 4.002(a) 15 .267
> 19.133 .000
> Intercept 75.371 1
> 75.371
> 5405.720 .000
> am 2.403 1
> 2.403
> 172.363 .000
> nbr 1.482 3 .494
> 35.436 .000
> barr .040 1
> .040
> .841 .094
> am * nbr 018 3
> .006
> .428 .733
> am * barr .039 1
> .039
> 2.783 .098
> nbr * barr .051 3
> .017
> 1.217 .306
> am * nbr * barr .022 3
> .007
> .521 .669
> Error 1.771 127
> .014
>
> Total 80.796 143
>
> Corrected Total 5.772 142
> a R Squared = .693 (Adjusted R Squared = .657)
>
>
> Am I missing something else? I don't know the best way to
> post the data set, so I will send it to John and maybe he can
> post it if it is of interest.
>
> Thanks again!
>
> Karla
>
> Karla Sartor
> Montana State University - LRES
> ksartor at montana.edu
>
>
>
>
>
>
> John Fox wrote:
>
> >Dear Karla,
> >
> >I suggested last night that you send me further information, but
> >decided this morning to try out a reproducible example of my own:
> >
> >
> >
> >>set.seed(12345)
> >>A <- factor(sample(c("a1", "a2", "a3"), 100, replace=TRUE)) B <-
> >>factor(sample(c("b1", "b2"), 100, replace=TRUE)) C <-
> >>factor(sample(c("c1", "c2", "c3"), 100, replace=TRUE)) mu <-
> >>array(1:18, c(3,2,3)) a <- as.numeric(A) b <- as.numeric(B) c <-
> >>as.numeric(C) y <- mu[cbind(a,b,c)] + rnorm(100) mod <-
> lm(y ~ A*B*C)
> >>library(car)
> >>options(contrasts=c("contr.sum", "contr.poly")) Anova(mod,
> type="II")
> >>
> >>
> >Anova Table (Type II tests)
> >
> >Response: y
> > Sum Sq Df F value Pr(>F)
> >A 65.88 2 38.4098 1.696e-12 ***
> >B 196.47 1 229.0775 < 2.2e-16 ***
> >C 2441.00 2 1423.0809 < 2.2e-16 ***
> >A:B 0.22 2 0.1259 0.8819
> >A:C 6.92 4 2.0174 0.0996 .
> >B:C 0.87 2 0.5095 0.6027
> >A:B:C 2.89 4 0.8432 0.5018
> >Residuals 70.33 82
> >---
> >Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >
> >
> >>Anova(mod, type="III")
> >>
> >>
> >Anova Table (Type III tests)
> >
> >Response: y
> > Sum Sq Df F value Pr(>F)
> >(Intercept) 7830.2 1 9129.8959 < 2.2e-16 ***
> >A 55.7 2 32.4913 4.059e-11 ***
> >B 189.5 1 221.0076 < 2.2e-16 ***
> >C 2124.0 2 1238.2549 < 2.2e-16 ***
> >A:B 0.2 2 0.0942 0.9102
> >A:C 5.9 4 1.7323 0.1507
> >B:C 0.6 2 0.3417 0.7115
> >A:B:C 2.9 4 0.8432 0.5018
> >Residuals 70.3 82
> >---
> >Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >
> >
> >I don't have a working copy of SPSS anymore, but here's what
> SAS does
> >with this example:
> >
> > Source DF Type II SS
> Mean Square F
> >Value Pr > F
> >
> > A 2 65.884048 32.942024
> >38.41 <.0001
> > B 1 196.467384 196.467384
> >229.08 <.0001
> > A*B 2 0.215883 0.107942
> >0.13 0.8819
> > C 2 2440.998718 1220.499359
> >1423.08 <.0001
> > A*C 4 6.920872 1.730218
> >2.02 0.0996
> > B*C 2 0.873945 0.436973
> >0.51 0.6027
> > A*B*C 4 2.892820 0.723205
> >0.84 0.5018
> >
> >
> > Source DF Type III SS
> Mean Square F
> >Value Pr > F
> >
> > A 2 55.732128 27.866064
> >32.49 <.0001
> > B 1 189.546201 189.546201
> >221.01 <.0001
> > A*B 2 0.161608 0.080804
> >0.09 0.9102
> > C 2 2123.968177 1061.984089
> >1238.25 <.0001
> > A*C 4 5.942845 1.485711
> >1.73 0.1507
> > B*C 2 0.586168 0.293084
> >0.34 0.7115
> > A*B*C 4 2.892820 0.723205
> >0.84 0.5018
> >
> >So, as you can see, the results check.
> >
> >It's hard to know what to make of this without more
> information about
> >what you did. Much as I'm not an admirer of SPSS, I doubt whether it
> >computes type-III sums of squares incorrectly, so I suspect
> something
> >wrong with either your SPSS commands or your R commands.
> >
> >I hope this helps,
> > John
> >
> >--------------------------------
> >John Fox
> >Department of Sociology
> >McMaster University
> >Hamilton, Ontario
> >Canada L8S 4M4
> >905-525-9140x23604
> >http://socserv.mcmaster.ca/jfox
> >--------------------------------
> >
> >
> >
> >>-----Original Message-----
> >>From: r-help-bounces at stat.math.ethz.ch
> >>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Karla Sartor
> >>Sent: Saturday, December 18, 2004 6:43 PM
> >>To: r-help at stat.math.ethz.ch
> >>Subject: [R] Sums of sq in car package Anova function
> >>
> >>Hello R users,
> >>
> >>I am trying to run a three factor ANOVA on a data set with unequal
> >>sample sizes.
> >>
> >>I fit the data to a 'lm' object and used the Anova function
> from the
> >>'car' package with the 'type=III' option to get type III sums of
> >>squares. I also set the contrast coding option to
> 'options(contrasts
> >>= c("contr.sum", "contr.poly"))' as cautioned in Jon Fox's
> book "An R
> >>and S-plus Companion to Applied Regression'.
> >>
> >>Is there anything else that I need to consider when using
> the type III
> >>option with the Anova function?
> >>
> >>When I run the same data set in SPSS with General Linear Model and
> >>type III sums of squares, the sums of squares are different enough
> >>that one of the main effect terms is significant in the R table and
> >>not in the SPSS table. I found a similar discrepancy with
> a different
> >>data set, only SPSS showed a significant interaction effect while,
> >>while the 'Anova' function did not.
> >>
> >>I also compared the results from SPSS those from the 'anova'
> >>function in the base package, and the results are nearly
> identical. I
> >>would expect the two methods with type III sums of squares
> to be more
> >>similar, does anyone have any ideas as to why that was not
> the case?
> >>I am hoping to not go back to SPSS at this point, so am trying to
> >>decide which of the two R functions is most appropriate for me (and
> >>defensible, considering the unequal sample sizes).
> >>
> >>Thank you in advance for any ideas you may have!
> >>
> >>Karla
> >>
> >>Karla Sartor
> >>Montana State University - LRES
> >>ksartor at montana.edu
> >>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide!
> >>http://www.R-project.org/posting-guide.html
> >>
> >>
> >
> >
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list