[R] wrapper for coxph with a subset argument
Christos Hatzis
christos at nuverabio.com
Fri Nov 9 18:34:57 CET 2007
The following variation of what you proposed will allow you to either subset
the dataset outside coxph or to use the subset argument:
subwrap5 <- function(x, sb=NULL) {
coxph(Surv(times,event)~trt, data = x, subset = sb)
}
subwrap5(testdf, testdf$sex == 'F')
subwrap5(testdf[testdf$sex == 'F', ])
I'm not sure whether anyone of these would be more efficient with memory
usage - if this is an issue you could test with large datasets.
-Christos
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Erik Iverson
> Sent: Friday, November 09, 2007 12:05 PM
> To: 'r-help at stat.math.ethz.ch'
> Subject: [R] wrapper for coxph with a subset argument
>
> Dear R-help -
>
> Thanks to those who replied yesterday (Christos H. and Thomas
> L.) regarding my question on coxph and model formula, the
> answers worked perfectly.
>
> My new question involves the following.
>
> I want to run several coxph models (package survival) with
> the same dataset, but different subsets of that dataset.
>
> I have found a way to do this, described below in functions
> subwrap1 and subwrap2. These do not use the coxph "subset"
> argument, however, as you will see.
>
> My three main questions are :
>
> 1) When writing a wrapper like this, should I be using the
> subset argument in coxph(), or alternatively, doing what I am
> doing in subwrap1 and subwrap2 below? Is the subset argument
> in coxph more of a convenience tool for interactive use
> rather than programs?
>
> 2) If the approach in subwrap1 and subwrap2 is fine, is there
> a preference for using 'expressions' or 'strings'?
> Eventually, my program will create these subset conditions
> programmatically, so I think strings will be the way I have
> to go, even though I've seen warnings on this list about
> using the eval(parse()) construct.
>
> 3) Is there some approach to do this that I'm overlooking?
> My goal will be to produce a list of subset conditions
> (probably a character vector), and then use lapply to run the
> various cox regressions.
>
> I can already achieve my goal, I just would like to know more
> details about how others do things like this.
>
> I've simplified my code below to focus on where I feel I'm confused.
> Here is some code along with comments:
>
> #### BEGIN R SAMPLE CODE
>
> #Function for producing test data
> makeTestDF <- function(n) {
> times <- sample(1:200, n, replace = TRUE)
> event <- rbinom(n, 1, prob = .1)
> trt <- rep(c("A","B"), each = n/2)
> sex <- factor(c("M","F"))
> sex <- rep(sex, times = n/2)
> testdf <- data.frame(times,event,trt,sex) }
>
> # Make test data, n = 200
> testdf <- makeTestDF(200)
>
> # Cox wrapper function with subset, this one works # Takes
> subset as expression
> subwrap1 <- function(x, sb) {
> sb <- eval(substitute(sb), x)
> x <- x[sb,]
> coxph(Surv(times,event)~trt, data = x) }
>
> subwrap1(testdf, sex == 'F')
>
> # This next one also works, but uses a character variable #
> instead of an expression as the subset argument
>
> subwrap2 <- function(x, sb) {
> sb <- eval(parse(text = sb), x)
> x <- x[sb,]
> coxph(Surv(times,event)~trt, data = x) }
>
> subwrap2(testdf, "sex == 'F'")
>
> # Neither of the above use the coxph subset argument # If I
> try using that, I get stuck with expressions, # I've tried
> many # different things in the subset argument, but none #
> seem to do the trick. Is using this argument in a # program
> even advisable?
>
> subwrap3 <- function(x, sb) {
> coxph(Surv(times,event)~trt, data = x,
> subset = eval(substitute(sb), x))
> }
>
> subwrap3(testdf, sex == 'F') #does not work
>
> # Using a string, this works, however.
>
> subwrap4 <- function(x, sb) {
> coxph(Surv(times,event)~trt, data = x, subset =
> eval(parse(text=sb))) }
>
> subwrap4(testdf, "sex == 'F'")
>
> ### END R SAMPLE CODE
>
> Thanks so much,
> Erik Iverson
> iverson at biostat.wisc.edu
>
> > sessionInfo()
> R version 2.5.1 (2007-06-27)
> i686-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA
> TE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8
> ;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC
> _MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] "grDevices" "datasets" "tcltk" "splines"
> "graphics" "utils"
> [7] "stats" "methods" "base"
>
> other attached packages:
> debug mvbutils SPLOTS_1.2-6 Hmisc chron
> survival
> "1.1.0" "1.1.1" "1.2-6" "3.4-2" "2.3-13"
> "2.32"
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list