[BioC] about formula in ancova

Naomi Altman naomi at stat.psu.edu
Mon Mar 15 13:15:22 CET 2010


R produces sequential sums of squares, not "Type 
III" or partial SS.  The sequential SS are 
adjusted for the other variables in the order in 
which they are entered in the model.  The partial 
SS are adjusted for all other variables in the model.

The SAS manual explains this more fully under 
PROC REG (sequential and partial) and PROC GLM 
(sequential and Type III) and probably has the most concise explanations.

--Naomi

At 09:38 AM 3/14/2010, Dejian Zhao wrote:
>To provide more details about the confusing results mentioned in my
>previous email. Some parameters (eg. Sum Sq, Mean Sq, F value, Pr) about
>the two variables seem to depend on the order in the formula, and the
>variation of probability (Pr) directly changes the significance. As to
>my own data, changing the variables order in the formula leads to
>changes from significance to non-significance for one variable.
>
>Maybe this is a trivial question for a major in math. But as a major in
>biology, I expect someone to explain the formula and give some
>guidelines in writing a formula. Many thanks again.
>
>*Results of the first set of codes:*
> > ancova(Sodium ~ Calories + Type, data=hotdog)
>Analysis of Variance Table
>
>Response: Sodium
>Df Sum Sq Mean Sq F value Pr(>F)
>Calories 1 106270 106270 34.654 3.281e-07 ***
>Type 2 227386 113693 37.074 1.336e-10 ***
>Residuals 50 153331 3067
>---
>Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>
> > ancova(Sodium ~ Type + Calories, data=hotdog)
>Analysis of Variance Table
>
>Response: Sodium
>Df Sum Sq Mean Sq F value Pr(>F)
>Type 2 31739 15869 5.1749 0.009065 **
>Calories 1 301917 301917 98.4526 2.089e-13 ***
>Residuals 50 153331 3067
>---
>Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>
>
>*Results of the second set of codes:*
> > ancova(Sodium ~ Calories * Type, data=hotdog)
>Analysis of Variance Table
>
>Response: Sodium
>Df Sum Sq Mean Sq F value Pr(>F)
>Calories 1 106270 106270 35.6885 2.747e-07 ***
>Type 2 227386 113693 38.1815 1.195e-10 ***
>Calories:Type 2 10402 5201 1.7466 0.1853
>Residuals 48 142930 2978
>---
>Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>
> > ancova(Sodium ~ Type * Calories, data=hotdog)
>Analysis of Variance Table
>
>Response: Sodium
>Df Sum Sq Mean Sq F value Pr(>F)
>Type 2 31739 15869 5.3294 0.008124 **
>Calories 1 301917 301917 101.3927 2.019e-13 ***
>Type:Calories 2 10402 5201 1.7466 0.185267
>Residuals 48 142930 2978
>---
>Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>
>
>Dejian Zhao wrote:
> > Dear list members,
> >
> > I have a question about the formula in ancova(), which is embedded in
> > the HH package.There are several examples in the ancova() help file
> > which can be accessed by type "?ancova" in R console after loading HH
> > package.Some codes are as follows:
> >
> > hotdog <- read.table(hh("datasets/hotdog.dat"), header=TRUE)
> > ## y ~ x + a or y ~ a + x ## constant slope, different intercepts
> > ancova(Sodium ~ Calories + Type, data=hotdog)
> > ancova(Sodium ~ Type + Calories, data=hotdog)
> >
> > After running the codes,I found I got different results when choosing
> > different formula,i.e."ancova(Sodium ~ Calories + Type, data=hotdog)"
> > and "ancova(Sodium ~ Type + Calories, data=hotdog) " produced different
> > results.
> >
> > The same thing also happens to the following example codes:
> > ## y ~ x * a or y ~ a * x ## different slopes, and different intercepts
> > ancova(Sodium ~ Calories * Type, data=hotdog)
> > ancova(Sodium ~ Type * Calories, data=hotdog)
> >
> > Hence,I am confused about the difference between the formula "y ~ x + a"
> > and "y ~ a + x",likewise "y ~ x * a" and "y ~ a * x". I thought the
> > order of variables in the formula is arbitrary,however,here it seems the
> > order matters.
> >
> > Can someone explain the formula for me? And how should we choose the
> > formula, or arrange the order of variables in the formula, when
> > processing our own data?
> >
> > Many thanks!
> > Dejian
> >
> > -----
> > Dejian Zhao, PhD student
> > Group of Evolutionary Ecology
> > State Key Laboratory of Integrated Pest Management
> > Institute of Zoology, Chinese Academy of Sciences
> > 1 Beichen West Road, Chaoyang District, Beijing, P.R.China
> > Postal code: 100101
> > Tel (office): +86-10-64807217
> > Fax: +86-10-64807099
> > Email: zhaodj at ioz.ac.cn
>
>         [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list