[R] wrapper for coxph with a subset argument
Erik Iverson
iverson at biostat.wisc.edu
Fri Nov 9 18:04:52 CET 2007
Dear R-help -
Thanks to those who replied yesterday (Christos H. and Thomas L.)
regarding my question on coxph and model formula, the answers worked
perfectly.
My new question involves the following.
I want to run several coxph models (package survival) with the same
dataset, but different subsets of that dataset.
I have found a way to do this, described below in functions subwrap1 and
subwrap2. These do not use the coxph "subset" argument, however, as you
will see.
My three main questions are :
1) When writing a wrapper like this, should I be using the subset
argument in coxph(), or alternatively, doing what I am doing in subwrap1
and subwrap2 below? Is the subset argument in coxph more of a
convenience tool for interactive use rather than programs?
2) If the approach in subwrap1 and subwrap2 is fine, is there a
preference for using 'expressions' or 'strings'? Eventually, my program
will create these subset conditions programmatically, so I think strings
will be the way I have to go, even though I've seen warnings on this
list about using the eval(parse()) construct.
3) Is there some approach to do this that I'm overlooking? My goal will
be to produce a list of subset conditions (probably a character vector),
and then use lapply to run the various cox regressions.
I can already achieve my goal, I just would like to know more details
about how others do things like this.
I've simplified my code below to focus on where I feel I'm confused.
Here is some code along with comments:
#### BEGIN R SAMPLE CODE
#Function for producing test data
makeTestDF <- function(n) {
times <- sample(1:200, n, replace = TRUE)
event <- rbinom(n, 1, prob = .1)
trt <- rep(c("A","B"), each = n/2)
sex <- factor(c("M","F"))
sex <- rep(sex, times = n/2)
testdf <- data.frame(times,event,trt,sex)
}
# Make test data, n = 200
testdf <- makeTestDF(200)
# Cox wrapper function with subset, this one works
# Takes subset as expression
subwrap1 <- function(x, sb) {
sb <- eval(substitute(sb), x)
x <- x[sb,]
coxph(Surv(times,event)~trt, data = x)
}
subwrap1(testdf, sex == 'F')
# This next one also works, but uses a character variable
# instead of an expression as the subset argument
subwrap2 <- function(x, sb) {
sb <- eval(parse(text = sb), x)
x <- x[sb,]
coxph(Surv(times,event)~trt, data = x)
}
subwrap2(testdf, "sex == 'F'")
# Neither of the above use the coxph subset argument
# If I try using that, I get stuck with expressions,
# I've tried many
# different things in the subset argument, but none
# seem to do the trick. Is using this argument in a
# program even advisable?
subwrap3 <- function(x, sb) {
coxph(Surv(times,event)~trt, data = x,
subset = eval(substitute(sb), x))
}
subwrap3(testdf, sex == 'F') #does not work
# Using a string, this works, however.
subwrap4 <- function(x, sb) {
coxph(Surv(times,event)~trt, data = x, subset = eval(parse(text=sb)))
}
subwrap4(testdf, "sex == 'F'")
### END R SAMPLE CODE
Thanks so much,
Erik Iverson
iverson at biostat.wisc.edu
> sessionInfo()
R version 2.5.1 (2007-06-27)
i686-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "grDevices" "datasets" "tcltk" "splines" "graphics" "utils"
[7] "stats" "methods" "base"
other attached packages:
debug mvbutils SPLOTS_1.2-6 Hmisc chron
survival
"1.1.0" "1.1.1" "1.2-6" "3.4-2" "2.3-13"
"2.32"
More information about the R-help
mailing list