[BioC] about formula in ancova

Wed Mar 17 14:13:33 CET 2010

 How can we easily find out the kind of sums of squares produced by some 
package and function?
It seems that it is not easy to switch from one kind to another.

Thanks.
Dejian

Juan Pedro Steibel wrote:
> Good morning,
> I'd add that given an object of class lm in R, there is a package 
> called "car" (from CRAN, not BioC) that can produce type III sums of 
> squares.
>
> Hope that helps,
> JP
>
>
> Naomi Altman wrote:
>> R produces sequential sums of squares, not "Type III" or partial SS.  
>> The sequential SS are adjusted for the other variables in the order 
>> in which they are entered in the model.  The partial SS are adjusted 
>> for all other variables in the model.
>>
>> The SAS manual explains this more fully under PROC REG (sequential 
>> and partial) and PROC GLM (sequential and Type III) and probably has 
>> the most concise explanations.
>>
>> --Naomi
>>
>> At 09:38 AM 3/14/2010, Dejian Zhao wrote:
>>> To provide more details about the confusing results mentioned in my
>>> previous email. Some parameters (eg. Sum Sq, Mean Sq, F value, Pr) 
>>> about
>>> the two variables seem to depend on the order in the formula, and the
>>> variation of probability (Pr) directly changes the significance. As to
>>> my own data, changing the variables order in the formula leads to
>>> changes from significance to non-significance for one variable.
>>>
>>> Maybe this is a trivial question for a major in math. But as a major in
>>> biology, I expect someone to explain the formula and give some
>>> guidelines in writing a formula. Many thanks again.
>>>
>>> *Results of the first set of codes:*
>>> > ancova(Sodium ~ Calories + Type, data=hotdog)
>>> Analysis of Variance Table
>>>
>>> Response: Sodium
>>> Df Sum Sq Mean Sq F value Pr(>F)
>>> Calories 1 106270 106270 34.654 3.281e-07 ***
>>> Type 2 227386 113693 37.074 1.336e-10 ***
>>> Residuals 50 153331 3067
>>> ---
>>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>>
>>> > ancova(Sodium ~ Type + Calories, data=hotdog)
>>> Analysis of Variance Table
>>>
>>> Response: Sodium
>>> Df Sum Sq Mean Sq F value Pr(>F)
>>> Type 2 31739 15869 5.1749 0.009065 **
>>> Calories 1 301917 301917 98.4526 2.089e-13 ***
>>> Residuals 50 153331 3067
>>> ---
>>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>>
>>>
>>> *Results of the second set of codes:*
>>> > ancova(Sodium ~ Calories * Type, data=hotdog)
>>> Analysis of Variance Table
>>>
>>> Response: Sodium
>>> Df Sum Sq Mean Sq F value Pr(>F)
>>> Calories 1 106270 106270 35.6885 2.747e-07 ***
>>> Type 2 227386 113693 38.1815 1.195e-10 ***
>>> Calories:Type 2 10402 5201 1.7466 0.1853
>>> Residuals 48 142930 2978
>>> ---
>>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>>
>>> > ancova(Sodium ~ Type * Calories, data=hotdog)
>>> Analysis of Variance Table
>>>
>>> Response: Sodium
>>> Df Sum Sq Mean Sq F value Pr(>F)
>>> Type 2 31739 15869 5.3294 0.008124 **
>>> Calories 1 301917 301917 101.3927 2.019e-13 ***
>>> Type:Calories 2 10402 5201 1.7466 0.185267
>>> Residuals 48 142930 2978
>>> ---
>>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>>
>>>
>>> Dejian Zhao wrote:
>>> > Dear list members,
>>> >
>>> > I have a question about the formula in ancova(), which is embedded in
>>> > the HH package.There are several examples in the ancova() help file
>>> > which can be accessed by type "?ancova" in R console after loading HH
>>> > package.Some codes are as follows:
>>> >
>>> > hotdog <- read.table(hh("datasets/hotdog.dat"), header=TRUE)
>>> > ## y ~ x + a or y ~ a + x ## constant slope, different intercepts
>>> > ancova(Sodium ~ Calories + Type, data=hotdog)
>>> > ancova(Sodium ~ Type + Calories, data=hotdog)
>>> >
>>> > After running the codes,I found I got different results when choosing
>>> > different formula,i.e."ancova(Sodium ~ Calories + Type, data=hotdog)"
>>> > and "ancova(Sodium ~ Type + Calories, data=hotdog) " produced 
>>> different
>>> > results.
>>> >
>>> > The same thing also happens to the following example codes:
>>> > ## y ~ x * a or y ~ a * x ## different slopes, and different 
>>> intercepts
>>> > ancova(Sodium ~ Calories * Type, data=hotdog)
>>> > ancova(Sodium ~ Type * Calories, data=hotdog)
>>> >
>>> > Hence,I am confused about the difference between the formula "y ~ 
>>> x + a"
>>> > and "y ~ a + x",likewise "y ~ x * a" and "y ~ a * x". I thought the
>>> > order of variables in the formula is arbitrary,however,here it 
>>> seems the
>>> > order matters.
>>> >
>>> > Can someone explain the formula for me? And how should we choose the
>>> > formula, or arrange the order of variables in the formula, when
>>> > processing our own data?
>>> >
>>> > Many thanks!
>>> > Dejian
>>> >
>>> > -----
>>> > Dejian Zhao, PhD student
>>> > Group of Evolutionary Ecology
>>> > State Key Laboratory of Integrated Pest Management
>>> > Institute of Zoology, Chinese Academy of Sciences
>>> > 1 Beichen West Road, Chaoyang District, Beijing, P.R.China
>>> > Postal code: 100101
>>> > Tel (office): +86-10-64807217
>>> > Fax: +86-10-64807099
>>> > Email: zhaodj at ioz.ac.cn
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Naomi S. Altman                                814-865-3791 (voice)
>> Associate Professor
>> Dept. of Statistics                              814-863-7114 (fax)
>> Penn State University                         814-865-1348 (Statistics)
>> University Park, PA 16802-2111
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor