[R] Different behavior of model.matrix between R 3.2 and R3.1.1
Frank Harrell
f.harrell at Vanderbilt.Edu
Mon Jun 15 04:22:07 CEST 2015
Terry - your example didn't demonstrate the problem because the variable
that interacted with strata (zed) was not a factor variable.
But I had stated the problem incorrectly. It's not that there are too
many strata terms; there are too many non-strata terms when the variable
interacting with the stratification factor is a factor variable. Here
is a simple example, where I have attached no packages other than the
basic startup packages.
strat <- function(x) x
d <- expand.grid(a=c('a1','a2'), b=c('b1','b2'))
d$y <- c(1,3,2,4)
f <- y ~ a * strat(b)
m <- model.frame(f, data=d)
Terms <- terms(f, specials='strat', data=d)
specials <- attr(Terms, 'specials')
temp <- survival:::untangle.specials(Terms, 'strat', 1)
Terms <- Terms[- temp$terms]
model.matrix(Terms, m)
(Intercept) aa2 aa1:strat(b)b2 aa2:strat(b)b2
1 1 0 0 0
2 1 1 0 0
3 1 0 1 0
4 1 1 0 1
. . .
The column corresponding to a='a1' b='b2' should not be there
(aa1:strat(b)b2).
This does seem to be a change in R. Any help appreciated.
Note that after subsetting out strat terms using Terms[ - temp$terms],
Terms attributes factor and term.labels are:
attr(,"factors")
a a:strat(b)
y 0 0
a 1 2
strat(b) 0 1
attr(,"term.labels")
[1] "a" "a:strat(b)"
Frank
On 06/11/2015 08:44 AM, Therneau, Terry M., Ph.D. wrote:
> Frank,
> I'm not sure what is going on. The following test function works for
> me in both 3.1.1 and 3.2, i.e, the second model matrix has fewer
> columns. As I indicated to you earlier, the coxph code removes the
> strata() columns after creating X because I found it easier to correctly
> create the assign attribute.
>
> Can you create a worked example?
>
> require(survival)
> testfun <- function(formula, data) {
> tform <- terms(formula, specials="strata")
> mf <- model.frame(tform, data)
>
> terms2 <- terms(mf)
> strat <- untangle.specials(terms2, "strata")
> if (length(strat$terms)) terms2 <- terms2[-strat$terms]
> X <- model.matrix(terms2, mf)
> X
> }
>
> tdata <- data.frame(y= 1:10, zed = 1:10, grp =
> factor(c(1,1,1,2,2,2,1,1,3,3)))
>
> testfun(y ~ zed*grp, tdata)
>
> testfun(y ~ strata(grp)*zed, tdata)
>
>
> Terry T.
>
> ----- original message --
>
> For building design matrices for Cox proportional hazards models in the
> cph function in the rms package I have always used this construct:
>
> Terms <- terms(formula, specials=c("strat", "cluster", "strata"),
> data=data)
> specials <- attr(Terms, 'specials')
> stra <- specials$strat
> Terms.ns <- Terms
> if(length(stra)) {
> temp <- untangle.specials(Terms.ns, "strat", 1)
> Terms.ns <- Terms.ns[- temp$terms] #uses [.terms function
> }
> X <- model.matrix(Terms.ns, X)[, -1, drop=FALSE]
>
> The Terms.ns logic removes stratification factor "main effects" so that
> if a stratification factor interacts with a non-stratification factor,
> only the interaction terms are included, not the strat. factor main
> effects. [In a Cox PH model stratification goes into the nonparametric
> survival curve part of the model].
>
> Lately this logic quit working; model.matrix keeps the unneeded main
> effects in the design matrix. Does anyone know what changed in R that
> could have caused this, and possibly a workaround?
>
>
> -------
>
>
--
------------------------------------------------------------------------
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of *Biostatistics* *Vanderbilt University*
More information about the R-help
mailing list