[R] Different behavior of model.matrix between R 3.2 and R3.1.1
Frank Harrell
f.harrell at Vanderbilt.Edu
Tue Jun 16 00:36:57 CEST 2015
Thank you very much Terry. I'm still puzzled at why this worked a year
ago. What changed? I'd very much like to reverse the change by setting
an argument somewhere or manipulating the terms object.
I echo your sentiments about the general approach.
Frank
On 06/15/2015 09:05 AM, Therneau, Terry M., Ph.D. wrote:
> Frank,
> I don't think there is any way to "fix" your problem except the way
> that I did it.
>
> library(survival)
> tdata <- data.frame(y=c(1,3,3,5, 5,7, 7,9, 9,13),
> x1=factor(letters[c(1,1,1,1,1,2,2,2,2,2)]),
> x2= c(1,2,1,2,1,2,1,2,1,2))
>
> fit1 <- lm( y ~ x1 * strata(x2) - strata(x2), tdata)
> coef(fit1)
> (Intercept) x1b x1a:strata(x2)x2=2
> x1b:strata(x2)x2=2
> 3.000000 5.000000 1.000000 1.666667
>
> Your code is calling model.matrix with the same model frame and terms
> structure as the lm call above (I checked). In your case you know
> that the underlying model has 2 intercepts (strata), one for the group
> with x2=1 and another for the group with x2=2, but how is the
> model.matrix routine supposed to guess that? It can't, so
> model.matrix returns the proper result for the lm call. As seen above
> the result is not singular, while for the Cox model it is singular due
> to the extra intercept.
>
> This is simply an extension of leaving the "intercept" term in the
> model and then removing that column from the returned X matrix, which
> is necessary to have the correct coding for ordinary factor variables,
> something we've both done since day 1. In order for model.matrix to
> do the right thing with interactions, it has to know how many
> intercepts there actually are.
>
> I've come to the conclusion that the entire thrust of 'contrasts' in S
> was wrong headed, i.e., the "remove redundant columns from the X
> matrix ahead of time" logic. It is simply not possible for the
> model.matrix routine to guess correctly for all y and x combinations,
> something that been acknowledged in R by changing the default for
> "singular.ok" to TRUE. Dealing with this after the fact via a good
> contrast function (a la SAS -- heresy!) would have been a much better
> design choice. But as long as I'm in R the coxph routine tries to be
> a good citizen.
>
> Terry T.
--
------------------------------------------------------------------------
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of *Biostatistics* *Vanderbilt University*
[[alternative HTML version deleted]]
More information about the R-help
mailing list