[Rd] further notes on model.frame issue
Terry Therneau
therneau at mayo.edu
Mon Jan 19 19:29:37 CET 2009
This is a follow-up on my note of Saturday. Let me start with two important
clarifications
- I think this would be a nice addition, but I've had exactly one use for it
in the 15+ years of developing the survival package.
- I have a work around for the current case.
Prioritize accordingly.
The ideal would be to change survexp as follows:
fit <- survexp( ~ gender, data=mydata, ratetable=survexp.us,
ratevar=list(sex=gender, year=enroll.dt, age=age*365.25))
The model statement says that I want separate curves by gender, and is similar
to other model statements.
The ratevar option gives the mapping between my variable names and the dimnames
of the survexp.us rate table. It wants age in days, enrollment date to be some
sort of date object, and sex to be a factor. Then the heading of the R code
would be
m <- match.call()
m <- m[c(1, match(names(m), c('data','formula','na.action', 'subset',
'weights', 'ratevar'), nomatch=0)
m[[1]] <- as.name('model.frame')
m <- eval.parent(m)
That is, the variables enroll.dat and age are searched for in the data= arg.
This is like the start opion in glm, but a more complex result than a vector.
The model.frame function can't handle this. (Splus fails too, same spot, less
useful error message).
-----
The current code uses
fit <- survexp(~ gender + ratetable(sex=gender, year=enroll.dt,
age=age*365.25),
data=mydata, ratetable=survexp.us)
The ratetable function creates a matrix with extra attributes. The matrix
contains as.numeric of the factors with the levels remembered as an extra
attribute, and also looks out for dates. So the result is like ns() in the eyes
of model.frame, and it works. But having to write gender twice on the rhs is
confusing to users.
Thanks in advance for any comments.
Terry Therneau
More information about the R-devel
mailing list