[Rd] Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing
Tyler
tylermw at gmail.com
Thu Oct 12 19:12:23 CEST 2017
Hi,
I recently ran into an inconsistency in the way model.matrix.default
handles factor encoding for higher level interactions with categorical
variables when the full hierarchy of effects is not present. Depending on
which lower level interactions are specified, the factor encoding changes
for a higher level interaction. Consider the following minimal reproducible
example:
--------------
> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 X3B X3C X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C
1 1 1 1 0 0 1 0 0 0 0
0 0
2 1 -1 1 0 0 -1 0 0 0 0
0 0
3 1 1 -1 0 0 -1 0 0 0 0
0 0
4 1 -1 -1 0 0 1 0 0 0 0
0 0
5 1 1 1 1 0 1 1 0 1 0
1 0
6 1 -1 1 1 0 -1 -1 0 1 0
-1 0
7 1 1 -1 1 0 -1 1 0 -1 0
-1 0
8 1 -1 -1 1 0 1 -1 0 -1 0
1 0
9 1 1 1 0 1 1 0 1 0 1
0 1
10 1 -1 1 0 1 -1 0 -1 0 1
0 -1
11 1 1 -1 0 1 -1 0 1 0 -1
0 -1
12 1 -1 -1 0 1 1 0 -1 0 -1
0 1
attr(,"assign")
[1] 0 1 2 3 3 4 5 5 6 6 7 7
attr(,"contrasts")
attr(,"contrasts")$X3
[1] "contr.treatment"
--------------
Specifying the full hierarchy gives us what we expect: the interaction
columns are simply calculated from the product of the main effect columns.
The interaction term X1:X2:X3 gives us two columns in the model matrix,
X1:X2:X3B and X1:X2:X3C, matching the products of the main effects.
If we remove either the X2:X3 interaction or the X1:X3 interaction, we get
what we would expect for the X1:X2:X3 interaction, but when we remove the
X1:X2 interaction the encoding for X1:X2:X3 changes completely:
--------------
> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) X1 X2 X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C
1 1 1 1 0 0 1 0 0 0 0
2 1 -1 1 0 0 -1 0 0 0 0
3 1 1 -1 0 0 -1 0 0 0 0
4 1 -1 -1 0 0 1 0 0 0 0
5 1 1 1 1 0 1 1 0 1 0
6 1 -1 1 1 0 -1 1 0 -1 0
7 1 1 -1 1 0 -1 -1 0 -1 0
8 1 -1 -1 1 0 1 -1 0 1 0
9 1 1 1 0 1 1 0 1 0 1
10 1 -1 1 0 1 -1 0 1 0 -1
11 1 1 -1 0 1 -1 0 -1 0 -1
12 1 -1 -1 0 1 1 0 -1 0 1
attr(,"assign")
[1] 0 1 2 3 3 4 5 5 6 6
attr(,"contrasts")
attr(,"contrasts")$X3
[1] "contr.treatment"
> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) X1 X2 X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C
1 1 1 1 0 0 1 0 0 0 0
2 1 -1 1 0 0 -1 0 0 0 0
3 1 1 -1 0 0 -1 0 0 0 0
4 1 -1 -1 0 0 1 0 0 0 0
5 1 1 1 1 0 1 1 0 1 0
6 1 -1 1 1 0 -1 -1 0 -1 0
7 1 1 -1 1 0 -1 1 0 -1 0
8 1 -1 -1 1 0 1 -1 0 1 0
9 1 1 1 0 1 1 0 1 0 1
10 1 -1 1 0 1 -1 0 -1 0 -1
11 1 1 -1 0 1 -1 0 1 0 -1
12 1 -1 -1 0 1 1 0 -1 0 1
attr(,"assign")
[1] 0 1 2 3 3 4 5 5 6 6
attr(,"contrasts")
attr(,"contrasts")$X3
[1] "contr.treatment"
> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) X1 X2 X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C
1 1 1 1 0 0 0 0 0 0 1
0 0
2 1 -1 1 0 0 0 0 0 0 -1
0 0
3 1 1 -1 0 0 0 0 0 0 -1
0 0
4 1 -1 -1 0 0 0 0 0 0 1
0 0
5 1 1 1 1 0 1 0 1 0 0
1 0
6 1 -1 1 1 0 -1 0 1 0 0
-1 0
7 1 1 -1 1 0 1 0 -1 0 0
-1 0
8 1 -1 -1 1 0 -1 0 -1 0 0
1 0
9 1 1 1 0 1 0 1 0 1 0
0 1
10 1 -1 1 0 1 0 -1 0 1 0
0 -1
11 1 1 -1 0 1 0 1 0 -1 0
0 -1
12 1 -1 -1 0 1 0 -1 0 -1 0
0 1
attr(,"assign")
[1] 0 1 2 3 3 4 4 5 5 6 6 6
attr(,"contrasts")
attr(,"contrasts")$X3
[1] "contr.treatment"
--------------
Here, we now see the encoding for the interaction X1:X2:X3 is now the
interaction of X1 and X2 with a new encoding for X3 where each factor level
is represented by its own column. I would expect, given the two column
dummy variable encoding for X3, that the X1:X2:X3 column would also be two
columns regardless of what two-factor interactions we also specified, but
in this case it switches to three. If other two factor interactions are
missing in addition to X1:X2, this issue still occurs. This also happens
regardless of the contrast specified in contrasts.arg for X3. I don't see
any reasoning for this behavior given in the documentation, so I suspect it
is a bug.
Best regards,
Tyler Morgan-Wall
[[alternative HTML version deleted]]
More information about the R-devel
mailing list