[R] nested factors different with/out brackets - is this a design feature?

Thu May 6 17:39:05 CEST 2010

Dear R Helpers,

I came across discrepancies in estimates for nested factors in linear models computed with brackets and also when a crossing factor is defined as a factor or as a dummy (0/1) vector. These are probably designed features that I do not understand. I wonder if someone would be so kind as to explain them to me.

First, if I do not bracket the nested pair of factors A/C, then (1) I obtain coefficients for all 3 levels (0/1/2) of the crossing factor B (compare models m0 and m1, below). Moreover, the values of some corresponding coefficients in the 2 models are not the same. So, in m1 B0:A1:D1 = -0.13112 and in m2, A1:C1 (the corresponding term in a model with no B0 coefficients) = -0.13112. But, the coefficients for B1:A1:C1 in m1 and m2 are 0.08909 and 0.22021. Why do they differ, here? Also, (2) if I bracket the nested pair of factors "(A/C)", then I obtain the B:C interaction - which I did not expect. I thought that if I bracket (A/C) - then this would constrain the model to generate only A, A:B and A:B:C as in m1 (compare models m1 and m2).

Second, if I use restrict the levels of the crossing factor B to be 0 or 1 (via subset=B!=2) and compare this model with a dummy vector b with identical 0/1 values, then I again obtain different outputs - compare m3 and m4, below. Again, there is no simple correspondence between m3 and m4, even though all(dat$B==dat$b) returns TRUE.

So, I do not understand what is happening here, either when comparing m1 with m2 or when comparing m3 with m4.

With many thanks, in anticipation of your help in explaining this,

Jonathan Williams

Here is a simple sample code to generate the discrepancies:-

set.seed(1)
A=factor(rep(c(1:5),600))
B=factor(rep(c(0:2),each=1000))
b=as.numeric(as.character(B))
C=factor(rbinom(3000,1,0.5))
set.seed(2)
y=rnorm(3000)
dat=data.frame(A,B,b,C,y)
m0=lm(y~B*A); summary(m0)
m1=lm(y~B*A/C,dat); summary(m1)
m2=lm(y~B*(A/C),dat); summary(m2)

m3=lm(y~B*A/C,dat,subset=B!=2); summary(m3)
m4=lm(y~b*A/C,dat,subset=B!=2); summary(m4)