[R] formula behaviour in model.matrix

Sundar Dorai-Raj sundar.dorai-raj at pdf.com
Fri Feb 11 17:07:28 CET 2005


Hi all,

Perhaps somebody can explain the following behaviour to me.

Take the following data.frame.

z <- expand.grid(X = LETTERS[1:3], Y = letters[1:3])

Now, from ?formula we see:

<quote>
The '*' operator denotes factor crossing: 'a*b' interpreted as 'a+b+a:b'.
</quote>

So I would expect the following:

ncol(model.matrix(~X*Y, z)) # returns 1 + 2 + 2 + 2 * 2 = 9

and

ncol(model.matrix(~X + Y + X:Y, z)) # returns 1 + 2 + 2 + 2 * 2 = 9

are equivalent.

However, I did not expect this:

ncol(model.matrix(~X:Y, z)) # returns 1 + 3 * 3 = 10

Why isn't this 5? In other words, why doesn't "~X:Y" just denote the 
interaction term so that all you would get is an intercept plus the 
two-way interaction between X and Y (1 + 2 * 2 = 5 parameters)? Instead 
what is returned is the fully crossed effects (every level of X against 
every level of Y) plus an intercept. Is there something in the 
documentation I'm missing?

--sundar

P.S. This behaviour is identical in S-PLUS 6.2.




More information about the R-help mailing list