[R] Dominant factors in aov?
Christoph Scherber
Christoph.Scherber at uni-jena.de
Thu Dec 2 16:54:20 CET 2004
Dear Rene,
First of all, note that A,B,C,D, and E need to be declared as factors in
the beginning, using factor() (but I think you did this already). Also,
make sure that the data are read into R in the correct way (i.e. "."
separating decimal places).
The reason for the "singularities" is that B, C and D are not
independent (in fact, they´re identical in their factor levels, and
hence in their effect on Y).
For this reason, only the effects of A, B and E can be estimated:
Df Sum Sq Mean Sq F value Pr(>F)
A 3 302286 100762 7.9887 0.002396 **
B 1 422869 422869 33.5263 4.683e-05 ***
E 3 22281 7427 0.5888 0.632334
Residuals 14 176583 12613
A has 4 levels so there should be 3 d.f. (that´s correct in the table)
B has 2 levels so there is only 1 d.f. (that´s also correct)
E has 4 levels so there should be 3 d.f. (also O.K.)
In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so
that´s OK, too.
Hope this helps,
Christoph
levels(A)
[1] "0" "250" "500" "1000"
> levels(B)
[1] "1" "2"
> levels(E)
[1] "1" "2" "3" "4"
Rene Eschen wrote:
>Hi all,
>
>I'm using R 2.0.1. for Windows to analyze the influence of following factors
>on response Y:
>
>A (four levels)
>B (three levels)
>C (two levels)
>D (29 levels) with
>E (four replicates)
>
>The dataset looks like this:
>A B C D E Y
>0 1 1 1 1 491.9
>0 1 1 1 2 618.7
>0 1 1 1 3 448.2
>0 1 1 1 4 632.9
>250 1 1 1 1 92.4
>250 1 1 1 2 117
>250 1 1 1 3 35.5
>250 1 1 1 4 102.7
>500 1 1 1 1 47
>500 1 1 1 2 57.4
>500 1 1 1 3 6.5
>500 1 1 1 4 50.9
>1000 1 1 1 1 0.7
>1000 1 1 1 2 6.2
>1000 1 1 1 3 0.5
>1000 1 1 1 4 1.1
>0 2 2 2 1 6
>0 2 2 2 2 4.2
>0 2 2 2 3 20.3
>0 2 2 2 4 3.5
>250 2 2 2 1 8.4
>250 2 2 2 2 2.8
>
>etc.
>
>If I ask the following: summary(aov(Y~A+B+C+D+E))
>
>R gives me this answer:
>
> Df Sum Sq Mean Sq F value Pr(>F)
>A 3 135.602 45.201 310.2166 <2e-16 ***
>B 2 0.553 0.276 1.8976 0.1512
>C 1 0.281 0.281 1.9264 0.1659
>D 25 92.848 3.714 25.4890 <2e-16 ***
>E 3 0.231 0.077 0.5279 0.6634
>Residuals 411 59.885 0.146
>
>Can someone explain me why factor C has only 25 Df (in stead of 28, what I
>expected), and why this number changes when I leave out factors B or C (but
>not A)? Why do factors B and C (but again: not A) not show up in the
>calculation if they appear later in the formula than D?
>
>When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
>were not defined because of "singularities" (what does this word mean?).
>After checking and playing around with the dataset, I find no logical reason
>for which levels are not defined. Even if I construct a "perfect" dataset
>(balanced, no missing values) I never get the correct number of Df.
>
>My other datasets are analyzed as expected using the similar function calls
>and similar datasets. Am I doing something wrong here?
>
>Many thanks,
>
>René Eschen.
>
>___
>drs. René Eschen
>CABI Bioscience Switzerland Centre
>1 Rue des Grillons
>CH-2800 Delémont
>Switzerland
>+41 32 421 48 87 (Direct)
>+41 32 421 48 70 (Secretary)
>+41 32 421 48 71 (Fax)
>
>http://www.unifr.ch/biol/ecology/muellerschaerer/group/eschen/eschen.html
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>
More information about the R-help
mailing list