[R] formula behaviour in model.matrix
Sundar Dorai-Raj
sundar.dorai-raj at pdf.com
Fri Feb 11 17:07:28 CET 2005
Hi all,
Perhaps somebody can explain the following behaviour to me.
Take the following data.frame.
z <- expand.grid(X = LETTERS[1:3], Y = letters[1:3])
Now, from ?formula we see:
<quote>
The '*' operator denotes factor crossing: 'a*b' interpreted as 'a+b+a:b'.
</quote>
So I would expect the following:
ncol(model.matrix(~X*Y, z)) # returns 1 + 2 + 2 + 2 * 2 = 9
and
ncol(model.matrix(~X + Y + X:Y, z)) # returns 1 + 2 + 2 + 2 * 2 = 9
are equivalent.
However, I did not expect this:
ncol(model.matrix(~X:Y, z)) # returns 1 + 3 * 3 = 10
Why isn't this 5? In other words, why doesn't "~X:Y" just denote the
interaction term so that all you would get is an intercept plus the
two-way interaction between X and Y (1 + 2 * 2 = 5 parameters)? Instead
what is returned is the fully crossed effects (every level of X against
every level of Y) plus an intercept. Is there something in the
documentation I'm missing?
--sundar
P.S. This behaviour is identical in S-PLUS 6.2.
More information about the R-help
mailing list