[R] Dominant factors in aov?

Thu Dec 2 17:51:23 CET 2004

Dear Rene,

At least from the part of the data.frame attached to your mail, I 
assumed that C,D and E changed in identical ways (but maybe I got this 
wrong).

With your following combination of factors:

A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)

And assuming independence of the treatment levels, you should get

3 d.f. for A
2 d.f. for B
28 d.f. for D
3 d.f. for E
? residual d.f. (how big is total number of Y values?)

The problem arises if parts of treatments B,D and E are applied to the same subjects, e.g.

B	D	E	Y	
1	1	1	400
2	2	2	300
2	2	3	420
2	2	4	350
(etc)

then you immediately run into problems because treatments B and D (in this case) change in an identical way, i.e. the variances calculated for each level of B and D are the same; this is what causes the Â´singularitiesÂ´. Errors need to be independent, otherwise you will have order dependence in your analyses.

i.e. the output of your aov model will change depending on the sequence in which the terms A,B,C,D,E are entered.

Did I get this right? It would probably help to see the full dataset

Best wishes
Christoph

Rene Eschen wrote:

>Dear Christoph, 
>
>  
>
>>The reason for the "singularities" is that B, C and D are not 
>>independent (in fact, theyÂ´re identical in their factor levels, and 
>>hence in their effect on Y).
>>    
>>
>
>I do not understand this. You gave the correct levels for A, B and E, but I
>do not see how they are identical. They have different levels and different
>codings, or is it because A has the same number of levels as E, and E shares
>some of the coding with B?
>
>RenÃ© Eschen.
>
>---
>
>For this reason, only the effects of A, B and E can be estimated:
>
>           Df Sum Sq Mean Sq F value    Pr(>F)   
>A            3 302286  100762  7.9887  0.002396 **
>B            1 422869  422869 33.5263 4.683e-05 ***
>E            3  22281    7427  0.5888  0.632334   
>Residuals   14 176583   12613                     
>
>A has 4 levels so there should be 3 d.f. (thatÂ´s correct in the table)
>B has 2 levels so there is only 1 d.f. (thatÂ´s also correct)
>E has 4 levels so there should be 3 d.f. (also O.K.)
>
>In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so 
>thatÂ´s OK, too.
>
>Hope this helps,
>Christoph
>
>
>
>levels(A)
>[1] "0"    "250"  "500"  "1000"
> > levels(B)
>[1] "1" "2"
> > levels(E)
>[1] "1" "2" "3" "4"
>
>
>
>
>
>Rene Eschen wrote:
>
>  
>
>>Hi all,
>>
>>I'm using R 2.0.1. for Windows to analyze the influence of following
>>    
>>
>factors
>  
>
>>on response Y:
>>
>>A (four levels)
>>B (three levels)
>>C (two levels)
>>D (29 levels) with
>>E (four replicates)
>>
>>The dataset looks like this:
>>A	B	C	D	E	Y
>>0	1	1	1	1	491.9
>>0	1	1	1	2	618.7
>>0	1	1	1	3	448.2
>>0	1	1	1	4	632.9
>>250	1	1	1	1	92.4
>>250	1	1	1	2	117
>>250	1	1	1	3	35.5
>>250	1	1	1	4	102.7
>>500	1	1	1	1	47
>>500	1	1	1	2	57.4
>>500	1	1	1	3	6.5
>>500	1	1	1	4	50.9
>>1000	1	1	1	1	0.7
>>1000	1	1	1	2	6.2
>>1000	1	1	1	3	0.5
>>1000	1	1	1	4	1.1
>>0	2	2	2	1	6
>>0	2	2	2	2	4.2
>>0	2	2	2	3	20.3
>>0	2	2	2	4	3.5
>>250	2	2	2	1	8.4
>>250	2	2	2	2	2.8
>>
>>etc.
>>
>>If I ask the following: summary(aov(Y~A+B+C+D+E))
>>
>>R gives me this answer:
>>
>>  		 Df  Sum Sq Mean Sq  F value Pr(>F)    
>>A  		  3 135.602  45.201 310.2166 <2e-16 ***
>>B  		  2   0.553   0.276   1.8976 0.1512    
>>C  		  1   0.281   0.281   1.9264 0.1659    
>>D  		 25  92.848   3.714  25.4890 <2e-16 ***
>>E  		  3   0.231   0.077   0.5279 0.6634    
>>Residuals   411  59.885   0.146   
>>
>>Can someone explain me why factor C has only 25 Df (in stead of 28, what I
>>expected), and why this number changes when I leave out factors B or C (but
>>not A)? Why do factors B and C (but again: not A) not show up in the
>>calculation if they appear later in the formula than D?
>>
>>When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
>>were not defined because of "singularities" (what does this word mean?).
>>After checking and playing around with the dataset, I find no logical
>>    
>>
>reason
>  
>
>>for which levels are not defined. Even if I construct a "perfect" dataset
>>(balanced, no missing values) I never get the correct number of Df. 
>>
>>My other datasets are analyzed as expected using the similar function calls
>>and similar datasets. Am I doing something wrong here?
>>
>>Many thanks,
>>
>>RenÃ© Eschen.
>>
>>___
>>drs. RenÃ© Eschen
>>CABI Bioscience Switzerland Centre
>>1 Rue des Grillons
>>CH-2800 DelÃ©mont
>>Switzerland
>>+41 32 421 48 87 (Direct)
>>+41 32 421 48 70 (Secretary)
>>+41 32 421 48 71 (Fax)
>>
>>http://www.unifr.ch/biol/ecology/muellerschaerer/group/eschen/eschen.html
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>    
>>
>http://www.R-project.org/posting-guide.html
>  
>
>> 
>>
>>    
>>
>
>  
>