# [R] Dominant factors in aov?

Rene Eschen r.eschen at cabi.org
Thu Dec 2 15:29:12 CET 2004

```Hi all,

I'm using R 2.0.1. for Windows to analyze the influence of following factors
on response Y:

A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)

The dataset looks like this:
A	B	C	D	E	Y
0	1	1	1	1	491.9
0	1	1	1	2	618.7
0	1	1	1	3	448.2
0	1	1	1	4	632.9
250	1	1	1	1	92.4
250	1	1	1	2	117
250	1	1	1	3	35.5
250	1	1	1	4	102.7
500	1	1	1	1	47
500	1	1	1	2	57.4
500	1	1	1	3	6.5
500	1	1	1	4	50.9
1000	1	1	1	1	0.7
1000	1	1	1	2	6.2
1000	1	1	1	3	0.5
1000	1	1	1	4	1.1
0	2	2	2	1	6
0	2	2	2	2	4.2
0	2	2	2	3	20.3
0	2	2	2	4	3.5
250	2	2	2	1	8.4
250	2	2	2	2	2.8

etc.

If I ask the following: summary(aov(Y~A+B+C+D+E))

Df  Sum Sq Mean Sq  F value Pr(>F)
A  		  3 135.602  45.201 310.2166 <2e-16 ***
B  		  2   0.553   0.276   1.8976 0.1512
C  		  1   0.281   0.281   1.9264 0.1659
D  		 25  92.848   3.714  25.4890 <2e-16 ***
E  		  3   0.231   0.077   0.5279 0.6634
Residuals   411  59.885   0.146

Can someone explain me why factor C has only 25 Df (in stead of 28, what I
expected), and why this number changes when I leave out factors B or C (but
not A)? Why do factors B and C (but again: not A) not show up in the
calculation if they appear later in the formula than D?

When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
were not defined because of "singularities" (what does this word mean?).
After checking and playing around with the dataset, I find no logical reason
for which levels are not defined. Even if I construct a "perfect" dataset
(balanced, no missing values) I never get the correct number of Df.

My other datasets are analyzed as expected using the similar function calls
and similar datasets. Am I doing something wrong here?

Many thanks,

___
CABI Bioscience Switzerland Centre
1 Rue des Grillons