# [R] Dominant factors in aov?

Christoph Scherber Christoph.Scherber at uni-jena.de
Thu Dec 2 16:54:20 CET 2004

```Dear Rene,

First of all, note that A,B,C,D, and E need to be declared as factors in
the beginning, using factor() (but I think you did this already). Also,
make sure that the data are read into R in the correct way (i.e. "."
separating decimal places).

The reason for the "singularities" is that B, C and D are not
independent (in fact, theyÂ´re identical in their factor levels, and
hence in their effect on Y).

For this reason, only the effects of A, B and E can be estimated:

Df Sum Sq Mean Sq F value    Pr(>F)
A            3 302286  100762  7.9887  0.002396 **
B            1 422869  422869 33.5263 4.683e-05 ***
E            3  22281    7427  0.5888  0.632334
Residuals   14 176583   12613

A has 4 levels so there should be 3 d.f. (thatÂ´s correct in the table)
B has 2 levels so there is only 1 d.f. (thatÂ´s also correct)
E has 4 levels so there should be 3 d.f. (also O.K.)

In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so
thatÂ´s OK, too.

Hope this helps,
Christoph

levels(A)
[1] "0"    "250"  "500"  "1000"
> levels(B)
[1] "1" "2"
> levels(E)
[1] "1" "2" "3" "4"

Rene Eschen wrote:

>Hi all,
>
>I'm using R 2.0.1. for Windows to analyze the influence of following factors
>on response Y:
>
>A (four levels)
>B (three levels)
>C (two levels)
>D (29 levels) with
>E (four replicates)
>
>The dataset looks like this:
>A	B	C	D	E	Y
>0	1	1	1	1	491.9
>0	1	1	1	2	618.7
>0	1	1	1	3	448.2
>0	1	1	1	4	632.9
>250	1	1	1	1	92.4
>250	1	1	1	2	117
>250	1	1	1	3	35.5
>250	1	1	1	4	102.7
>500	1	1	1	1	47
>500	1	1	1	2	57.4
>500	1	1	1	3	6.5
>500	1	1	1	4	50.9
>1000	1	1	1	1	0.7
>1000	1	1	1	2	6.2
>1000	1	1	1	3	0.5
>1000	1	1	1	4	1.1
>0	2	2	2	1	6
>0	2	2	2	2	4.2
>0	2	2	2	3	20.3
>0	2	2	2	4	3.5
>250	2	2	2	1	8.4
>250	2	2	2	2	2.8
>
>etc.
>
>If I ask the following: summary(aov(Y~A+B+C+D+E))
>
>
>   		 Df  Sum Sq Mean Sq  F value Pr(>F)
>A  		  3 135.602  45.201 310.2166 <2e-16 ***
>B  		  2   0.553   0.276   1.8976 0.1512
>C  		  1   0.281   0.281   1.9264 0.1659
>D  		 25  92.848   3.714  25.4890 <2e-16 ***
>E  		  3   0.231   0.077   0.5279 0.6634
>Residuals   411  59.885   0.146
>
>Can someone explain me why factor C has only 25 Df (in stead of 28, what I
>expected), and why this number changes when I leave out factors B or C (but
>not A)? Why do factors B and C (but again: not A) not show up in the
>calculation if they appear later in the formula than D?
>
>When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
>were not defined because of "singularities" (what does this word mean?).
>After checking and playing around with the dataset, I find no logical reason
>for which levels are not defined. Even if I construct a "perfect" dataset
>(balanced, no missing values) I never get the correct number of Df.
>
>My other datasets are analyzed as expected using the similar function calls
>and similar datasets. Am I doing something wrong here?
>
>Many thanks,
>
>
>___
>CABI Bioscience Switzerland Centre
>1 Rue des Grillons
>Switzerland
>+41 32 421 48 87 (Direct)
>+41 32 421 48 70 (Secretary)
>+41 32 421 48 71 (Fax)
>
>http://www.unifr.ch/biol/ecology/muellerschaerer/group/eschen/eschen.html
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help