[R] Different behavior of model.matrix between R 3.2 and R3.1.1

Frank Harrell f.harrell at Vanderbilt.Edu
Tue Jun 16 00:36:57 CEST 2015


Thank you very much Terry.  I'm still puzzled at why this worked a year 
ago.  What changed?  I'd very much like to reverse the change by setting 
an argument somewhere or manipulating the terms object.

I echo your sentiments about the general approach.

Frank


On 06/15/2015 09:05 AM, Therneau, Terry M., Ph.D. wrote:
> Frank,
>   I don't think there is any way to "fix" your problem except the way 
> that I did it.
>
> library(survival)
> tdata <- data.frame(y=c(1,3,3,5, 5,7, 7,9, 9,13),
>                     x1=factor(letters[c(1,1,1,1,1,2,2,2,2,2)]),
>                     x2= c(1,2,1,2,1,2,1,2,1,2))
>
> fit1 <- lm( y ~ x1 * strata(x2) - strata(x2), tdata)
> coef(fit1)
>        (Intercept)                x1b x1a:strata(x2)x2=2 
> x1b:strata(x2)x2=2
>           3.000000           5.000000           1.000000 1.666667
>
> Your code is calling model.matrix with the same model frame and terms 
> structure as the lm call above (I checked).  In your case you know 
> that the underlying model has 2 intercepts (strata), one for the group 
> with x2=1 and another for the group with x2=2, but how is the 
> model.matrix routine supposed to guess that?  It can't, so 
> model.matrix returns the proper result for the lm call.  As seen above 
> the result is not singular, while for the Cox model it is singular due 
> to the extra intercept.
>
> This is simply an extension of leaving the "intercept" term in the 
> model and then removing that column from the returned X matrix, which 
> is necessary to have the correct coding for ordinary factor variables, 
> something we've both done since day 1.  In order for model.matrix to 
> do the right thing with interactions, it has to know how many 
> intercepts there actually are.
>
> I've come to the conclusion that the entire thrust of 'contrasts' in S 
> was wrong headed, i.e., the "remove redundant columns from the X 
> matrix ahead of time" logic.  It is simply not possible for the 
> model.matrix routine to guess correctly for all y and x combinations, 
> something that been acknowledged in R by changing the default for 
> "singular.ok" to TRUE. Dealing with this after the fact via a good 
> contrast function (a la SAS -- heresy!) would have been a much better 
> design choice.  But as long as I'm in R the coxph routine tries to be 
> a good citizen.
>
> Terry T.

-- 
------------------------------------------------------------------------
Frank E Harrell Jr 	Professor and Chairman 	School of Medicine

	Department of *Biostatistics* 	*Vanderbilt University*


	[[alternative HTML version deleted]]



More information about the R-help mailing list