[Rd] Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing

Fri Oct 27 20:18:41 CEST 2017

Hello Tyler,

I want to bring to your attention the following document: "What
happens if you omit the main effect in a regression model with an
interaction?" (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction).
This gives a useful review of the problem. Your example is Case 2: a
continuous and a categorical regressor.

The numerical examples are coded in Stata, and they give the same
result as in R. Hence, if this is a bug in R then it is also a bug in
Stata. That seems very unlikely.

Here is a simulation in R of the above mentioned Case 2 in Stata:

df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4"))
print("Full model")
print(model.matrix(~(socst+grp)^2 ,data=df))
print("Example 2.1: drop socst")
print(model.matrix(~(socst+grp)^2 -socst ,data=df))
print("Example 2.2: drop grp")
print(model.matrix(~(socst+grp)^2 -grp ,data=df))

This gives indeed the following regressors:

"Full model"
(Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4
"Example 2.1: drop socst"
(Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 socst:grp4
"Example 2.2: drop grp"
(Intercept) socst socst:grp2 socst:grp3 socst:grp4

There is a little bit of R documentation about this, based on the
concept of marginality, which typically forbids a model having an
interaction but not the corresponding main effects. (You might see the
references in https://en.wikipedia.org/wiki/Principle_of_marginality )
    See "An Introduction to R", by Venables and Smith and the R Core
Team. At the bottom of page 52 (PDF: 57) it says: "Although the
details are complicated, model formulae in R will normally generate
the models that an expert statistician would expect, provided that
marginality is preserved. Fitting, for [a contrary] example, a model
with an interaction but not the corresponding main effects will in
general lead to surprising results ....".
    The Reference Manual states that the R functions dropterm() and
addterm() resp. drop or add only terms such that marginality is
preserved.

Finally, about your singular matrix t(mm)%*%mm. This is in fact
Example 2.1 in Case 2 discussed above. As discussed there, in Stata
and in R the drop of the continuous variable has no effect on the
degrees of freedom here: it is just a reparameterisation of the full
model, protecting you against losing marginality... Hence the
model.matrix 'mm' is still square and nonsingular after the drop of
X1, unless of course when a row is removed from the matrix 'design'
when before creating 'mm'.

    Arie

On Sun, Oct 15, 2017 at 7:05 PM, Tyler <tylermw at gmail.com> wrote:
> You could possibly try to explain away the behavior for a missing main
> effects term, since without the main effects term we don't have main effect
> columns in the model matrix used to compute the interaction columns (At
> best this is undocumented behavior--I still think it's a bug, as we know
> how we would encode the categorical factors if they were in fact present.
> It's either specified in contrasts.arg or using the default set in
> options). However, when all the main effects are present, why would the
> three-factor interaction column not simply be the product of the main
> effect columns? In my example: we know X1, we know X2, and we know X3. Why
> does the encoding of X1:X2:X3 depend on whether we specified a two-factor
> interaction, AND only changes for specific missing interactions?
>
> In addition, I can use a two-term example similar to yours to show how this
> behavior results in a singular covariance matrix when, given the desired
> factor encoding, it should not be singular.
>
> We start with a full factorial design for a two-level continuous factor and
> a three-level categorical factor, and remove a single row. This design
> matrix does not leave enough degrees of freedom to determine
> goodness-of-fit, but should allow us to obtain parameter estimates.
>
>> design = expand.grid(X1=c(1,-1),X2=c("A","B","C"))
>> design = design[-1,]
>> design
>   X1 X2
> 2 -1  A
> 3  1  B
> 4 -1  B
> 5  1  C
> 6 -1  C
>
> Here, we first calculate the model matrix for the full model, and then
> manually remove the X1 column from the model matrix. This gives us the
> model matrix one would expect if X1 were removed from the model. We then
> successfully calculate the covariance matrix.
>
>> mm = model.matrix(~(X1+X2)^2,data=design)
>> mm
>   (Intercept) X1 X2B X2C X1:X2B X1:X2C
> 2           1 -1   0   0      0      0
> 3           1  1   1   0      1      0
> 4           1 -1   1   0     -1      0
> 5           1  1   0   1      0      1
> 6           1 -1   0   1      0     -1
>
>> mm = mm[,-2]
>> solve(t(mm) %*% mm)
>             (Intercept)  X2B  X2C X1:X2B X1:X2C
> (Intercept)           1 -1.0 -1.0    0.0    0.0
> X2B                  -1  1.5  1.0    0.0    0.0
> X2C                  -1  1.0  1.5    0.0    0.0
> X1:X2B                0  0.0  0.0    0.5    0.0
> X1:X2C                0  0.0  0.0    0.0    0.5
>
> Here, we see the actual behavior for model.matrix. The undesired re-coding
> of the model matrix interaction term makes the information matrix singular.
>
>> mm = model.matrix(~(X1+X2)^2-X1,data=design)
>> mm
>   (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C
> 2           1   0   0     -1      0      0
> 3           1   1   0      0      1      0
> 4           1   1   0      0     -1      0
> 5           1   0   1      0      0      1
> 6           1   0   1      0      0     -1
>
>> solve(t(mm) %*% mm)
> Error in solve.default(t(mm) %*% mm) : system is computationally singular:
> reciprocal condition number = 5.55112e-18
>
> I still believe this is a bug.
>
> Best regards,
> Tyler Morgan-Wall
>
> On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate <arietencate at gmail.com>
> wrote:
>
>> I think it is not a bug. It is a general property of interactions.
>> This property is best observed if all variables are factors
>> (qualitative).
>>
>> For example, you have three variables (factors). You ask for as many
>> interactions as possible, except an interaction term between two
>> particular variables. When this interaction is not a constant, it is
>> different for different values of the remaining variable. More
>> precisely: for all values of that variable. In other words: you have a
>> three-way interaction, with all values of that variable.
>>
>> An even smaller example is the following script with only two
>> variables, each being a factor:
>>
>>  df <- expand.grid(X1=c("p","q"), X2=c("A","B","C"))
>>  print(model.matrix(~(X1+X2)^2    ,data=df))
>>  print(model.matrix(~(X1+X2)^2 -X1,data=df))
>>  print(model.matrix(~(X1+X2)^2 -X2,data=df))
>>
>> The result is:
>>
>>   (Intercept) X1q X2B X2C X1q:X2B X1q:X2C
>> 1           1   0   0   0       0       0
>> 2           1   1   0   0       0       0
>> 3           1   0   1   0       0       0
>> 4           1   1   1   0       1       0
>> 5           1   0   0   1       0       0
>> 6           1   1   0   1       0       1
>>
>>   (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C
>> 1           1   0   0       0       0       0
>> 2           1   0   0       1       0       0
>> 3           1   1   0       0       0       0
>> 4           1   1   0       0       1       0
>> 5           1   0   1       0       0       0
>> 6           1   0   1       0       0       1
>>
>>   (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C
>> 1           1   0       0       0       0       0
>> 2           1   1       0       0       0       0
>> 3           1   0       1       0       0       0
>> 4           1   1       0       1       0       0
>> 5           1   0       0       0       1       0
>> 6           1   1       0       0       0       1
>>
>> Thus, in the second result, we have no main effect of X1. Instead, the
>> effect of X1 depends on the value of X2; either A or B or C. In fact,
>> this is a two-way interaction, including all three values of X2. In
>> the third result, we have no main effect of X2, The effect of X2
>> depends on the value of X1; either p or q.
>>
>> A complicating element with your example seems to be that your X1 and
>> X2 are not factors.
>>
>>    Arie
>>
>> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <tylermw at gmail.com> wrote:
>> > Hi,
>> >
>> > I recently ran into an inconsistency in the way model.matrix.default
>> > handles factor encoding for higher level interactions with categorical
>> > variables when the full hierarchy of effects is not present. Depending on
>> > which lower level interactions are specified, the factor encoding changes
>> > for a higher level interaction. Consider the following minimal
>> reproducible
>> > example:
>> >
>> > --------------
>> >
>> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))>
>> model.matrix(~(X1+X2+X3)^3,data=runmatrix)   (Intercept) X1 X2 X3B X3C
>> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C
>> > 1            1  1  1   0   0     1      0      0      0      0
>> > 0         0
>> > 2            1 -1  1   0   0    -1      0      0      0      0
>> > 0         0
>> > 3            1  1 -1   0   0    -1      0      0      0      0
>> > 0         0
>> > 4            1 -1 -1   0   0     1      0      0      0      0
>> > 0         0
>> > 5            1  1  1   1   0     1      1      0      1      0
>> > 1         0
>> > 6            1 -1  1   1   0    -1     -1      0      1      0
>> > -1         0
>> > 7            1  1 -1   1   0    -1      1      0     -1      0
>> > -1         0
>> > 8            1 -1 -1   1   0     1     -1      0     -1      0
>> > 1         0
>> > 9            1  1  1   0   1     1      0      1      0      1
>> > 0         1
>> > 10           1 -1  1   0   1    -1      0     -1      0      1
>> > 0        -1
>> > 11           1  1 -1   0   1    -1      0      1      0     -1
>> > 0        -1
>> > 12           1 -1 -1   0   1     1      0     -1      0     -1
>> > 0         1
>> > attr(,"assign")
>> >  [1] 0 1 2 3 3 4 5 5 6 6 7 7
>> > attr(,"contrasts")
>> > attr(,"contrasts")$X3
>> > [1] "contr.treatment"
>> >
>> > --------------
>> >
>> > Specifying the full hierarchy gives us what we expect: the interaction
>> > columns are simply calculated from the product of the main effect
>> columns.
>> > The interaction term X1:X2:X3 gives us two columns in the model matrix,
>> > X1:X2:X3B and X1:X2:X3C, matching the products of the main effects.
>> >
>> > If we remove either the X2:X3 interaction or the X1:X3 interaction, we
>> get
>> > what we would expect for the X1:X2:X3 interaction, but when we remove the
>> > X1:X2 interaction the encoding for X1:X2:X3 changes completely:
>> >
>> > --------------
>> >
>> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix)   (Intercept) X1 X2
>> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C
>> > 1            1  1  1   0   0     1      0      0         0         0
>> > 2            1 -1  1   0   0    -1      0      0         0         0
>> > 3            1  1 -1   0   0    -1      0      0         0         0
>> > 4            1 -1 -1   0   0     1      0      0         0         0
>> > 5            1  1  1   1   0     1      1      0         1         0
>> > 6            1 -1  1   1   0    -1      1      0        -1         0
>> > 7            1  1 -1   1   0    -1     -1      0        -1         0
>> > 8            1 -1 -1   1   0     1     -1      0         1         0
>> > 9            1  1  1   0   1     1      0      1         0         1
>> > 10           1 -1  1   0   1    -1      0      1         0        -1
>> > 11           1  1 -1   0   1    -1      0     -1         0        -1
>> > 12           1 -1 -1   0   1     1      0     -1         0         1
>> > attr(,"assign")
>> >  [1] 0 1 2 3 3 4 5 5 6 6
>> > attr(,"contrasts")
>> > attr(,"contrasts")$X3
>> > [1] "contr.treatment"
>> >
>> >
>> >
>> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix)   (Intercept) X1 X2
>> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C
>> > 1            1  1  1   0   0     1      0      0         0         0
>> > 2            1 -1  1   0   0    -1      0      0         0         0
>> > 3            1  1 -1   0   0    -1      0      0         0         0
>> > 4            1 -1 -1   0   0     1      0      0         0         0
>> > 5            1  1  1   1   0     1      1      0         1         0
>> > 6            1 -1  1   1   0    -1     -1      0        -1         0
>> > 7            1  1 -1   1   0    -1      1      0        -1         0
>> > 8            1 -1 -1   1   0     1     -1      0         1         0
>> > 9            1  1  1   0   1     1      0      1         0         1
>> > 10           1 -1  1   0   1    -1      0     -1         0        -1
>> > 11           1  1 -1   0   1    -1      0      1         0        -1
>> > 12           1 -1 -1   0   1     1      0     -1         0         1
>> > attr(,"assign")
>> >  [1] 0 1 2 3 3 4 5 5 6 6
>> > attr(,"contrasts")
>> > attr(,"contrasts")$X3
>> > [1] "contr.treatment"
>> >
>> >
>> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix)   (Intercept) X1 X2
>> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C
>> > 1            1  1  1   0   0      0      0      0      0         1
>> >     0         0
>> > 2            1 -1  1   0   0      0      0      0      0        -1
>> >     0         0
>> > 3            1  1 -1   0   0      0      0      0      0        -1
>> >     0         0
>> > 4            1 -1 -1   0   0      0      0      0      0         1
>> >     0         0
>> > 5            1  1  1   1   0      1      0      1      0         0
>> >     1         0
>> > 6            1 -1  1   1   0     -1      0      1      0         0
>> >    -1         0
>> > 7            1  1 -1   1   0      1      0     -1      0         0
>> >    -1         0
>> > 8            1 -1 -1   1   0     -1      0     -1      0         0
>> >     1         0
>> > 9            1  1  1   0   1      0      1      0      1         0
>> >     0         1
>> > 10           1 -1  1   0   1      0     -1      0      1         0
>> >     0        -1
>> > 11           1  1 -1   0   1      0      1      0     -1         0
>> >     0        -1
>> > 12           1 -1 -1   0   1      0     -1      0     -1         0
>> >     0         1
>> > attr(,"assign")
>> >  [1] 0 1 2 3 3 4 4 5 5 6 6 6
>> > attr(,"contrasts")
>> > attr(,"contrasts")$X3
>> > [1] "contr.treatment"
>> >
>> > --------------
>> >
>> > Here, we now see the encoding for the interaction X1:X2:X3 is now the
>> > interaction of X1 and X2 with a new encoding for X3 where each factor
>> level
>> > is represented by its own column. I would expect, given the two column
>> > dummy variable encoding for X3, that the X1:X2:X3 column would also be
>> two
>> > columns regardless of what two-factor interactions we also specified, but
>> > in this case it switches to three. If other two factor interactions are
>> > missing in addition to X1:X2, this issue still occurs. This also happens
>> > regardless of the contrast specified in contrasts.arg for X3. I don't see
>> > any reasoning for this behavior given in the documentation, so I suspect
>> it
>> > is a bug.
>> >
>> > Best regards,
>> > Tyler Morgan-Wall
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel