[R-pkg-devel] Problem in stats::model.matrix when omitting two-way interactions

Fox, John jfox at mcmaster.ca
Thu Mar 30 19:41:33 CEST 2017


Dear Paul,

Isn't this really a question for r-help rather than r-package-devel?

In any event, I think that your message is based on (an entirely understandable) misunderstanding of what the : operator on the right-hand side of a model formula does. The : operator is equivalent to %in%, and used as you have will produce a nested model structure. If you insure that all lower-order terms (i.e., all terms marginal to an interaction) are present, : may be used to create interactions, but as a general matter formulas do not like to create model matrices that violate marginality.

Consider the following simplified (equivalent) examples, using your "data":

---------- snip --------------

> cbind(dat[, 2:3], stats::model.matrix(y ~ x1 + x1:x2, dat))
  x1 x2 (Intercept) x11 x10:x21 x11:x21
1  0  0           1   0       0       0
2  0  0           1   0       0       0
3  0  1           1   0       1       0
4  0  1           1   0       1       0
5  1  0           1   1       0       0
6  1  0           1   1       0       0
7  1  1           1   1       0       1
8  1  1           1   1       0       1
> 
> cbind(dat[, 2:3], stats::model.matrix(y ~ x1 + x2 %in% x1, dat))
  x1 x2 (Intercept) x11 x10:x21 x11:x21
1  0  0           1   0       0       0
2  0  0           1   0       0       0
3  0  1           1   0       1       0
4  0  1           1   0       1       0
5  1  0           1   1       0       0
6  1  0           1   1       0       0
7  1  1           1   1       0       1
8  1  1           1   1       0       1
> 
> cbind(dat[, 2:3], stats::model.matrix(y ~ x1/x2, dat))
  x1 x2 (Intercept) x11 x10:x21 x11:x21
1  0  0           1   0       0       0
2  0  0           1   0       0       0
3  0  1           1   0       1       0
4  0  1           1   0       1       0
5  1  0           1   1       0       0
6  1  0           1   1       0       0
7  1  1           1   1       0       1
8  1  1           1   1       0       1

---------- snip --------------

In the first two cases, reversing the order of the operands, x2:x1 and x2 %in% x1, would also produce the same results.

I hope this helps,
 John

-----------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
Web: socserv.mcmaster.ca/jfox



> -----Original Message-----
> From: R-package-devel [mailto:r-package-devel-bounces at r-project.org] On
> Behalf Of Paul Buerkner
> Sent: March 30, 2017 6:54 AM
> To: r-package-devel at r-project.org
> Subject: [R-pkg-devel] Problem in stats::model.matrix when omitting two-
> way interactions
> 
> Hi all,
> 
> recently I stumbled upen a problem in stats::model.matrix that I think is
> worth reporting.
> 
> When I run:
> 
> > dat <- data.frame(
> >   y = rnorm(8),
> >   x1 = factor(rep(0:1, each = 4)),
> >   x2 = factor(rep(rep(0:1, each = 2), 2)),
> >   x3 = factor(rep(0:1, 4))
> > )
> >
> > stats::model.matrix(y ~ x1+x2+x3 + x1:x2:x3, dat)
> 
> I get a matrix with 12 columns, which are linearily dependent and thus not
> identified in a linear model:
> 
> > summary(lm(y ~  x1+x2+x3 + x1:x2:x3, dat))
> 
> Of course, there is usually no need for such a formula that ignores the two-
> way interactions, but from my point of view, model.matrix should still return
> only 8 columns (or less) in order to produce identified models.
> 
> I wonder if this is some sort of intendend behavior or just a side effect of the
> way model.matrix handles factors.
> 
> Many thanks in advance.
> 
> Paul
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-package-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



More information about the R-package-devel mailing list