[R] wrapper for coxph with a subset argument

Fri Nov 9 18:34:57 CET 2007

The following variation of what you proposed will allow you to either subset
the dataset outside coxph or to use the subset argument:

subwrap5 <- function(x, sb=NULL) {
   coxph(Surv(times,event)~trt, data = x, subset = sb)
}

subwrap5(testdf, testdf$sex == 'F')
subwrap5(testdf[testdf$sex == 'F', ])

I'm not sure whether anyone of these would be more efficient with memory
usage - if this is an issue you could test with large datasets.

-Christos

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Erik Iverson
> Sent: Friday, November 09, 2007 12:05 PM
> To: 'r-help at stat.math.ethz.ch'
> Subject: [R] wrapper for coxph with a subset argument
> 
> Dear R-help -
> 
> Thanks to those who replied yesterday (Christos H. and Thomas 
> L.) regarding my question on coxph and model formula, the 
> answers worked perfectly.
> 
> My new question involves the following.
> 
> I want to run several coxph models (package survival) with 
> the same dataset, but different subsets of that dataset.
> 
> I have found a way to do this, described below in functions 
> subwrap1 and subwrap2.  These do not use the coxph "subset" 
> argument, however, as you will see.
> 
> My three main questions are :
> 
> 1) When writing a wrapper like this, should I be using the 
> subset argument in coxph(), or alternatively, doing what I am 
> doing in subwrap1 and subwrap2 below?  Is the subset argument 
> in coxph more of a convenience tool for interactive use 
> rather than programs?
> 
> 2) If the approach in subwrap1 and subwrap2 is fine, is there 
> a preference for using 'expressions' or 'strings'?  
> Eventually, my program will create these subset conditions 
> programmatically, so I think strings will be the way I have 
> to go, even though I've seen warnings on this list about 
> using the eval(parse()) construct.
> 
> 3) Is there some approach to do this that I'm overlooking?  
> My goal will be to produce a list of subset conditions 
> (probably a character vector), and then use lapply to run the 
> various cox regressions.
> 
> I can already achieve my goal, I just would like to know more 
> details about how others do things like this.
> 
> I've simplified my code below to focus on where I feel I'm confused. 
> Here is some code along with comments:
> 
> #### BEGIN R SAMPLE CODE
> 
> #Function for producing test data
> makeTestDF <- function(n) {
>    times  <- sample(1:200, n, replace = TRUE)
>    event  <- rbinom(n, 1, prob = .1)
>    trt    <- rep(c("A","B"), each = n/2)
>    sex    <- factor(c("M","F"))
>    sex    <- rep(sex, times = n/2)
>    testdf <- data.frame(times,event,trt,sex) }
> 
> # Make test data, n = 200
> testdf <- makeTestDF(200)
> 
> # Cox wrapper function with subset, this one works # Takes 
> subset as expression
> subwrap1 <- function(x, sb) {
>    sb <- eval(substitute(sb), x)
>    x <- x[sb,]
>    coxph(Surv(times,event)~trt, data = x) }
> 
> subwrap1(testdf, sex == 'F')
> 
> # This next one also works, but uses a character variable # 
> instead of an expression as the subset argument
> 
> subwrap2 <- function(x, sb) {
>    sb <- eval(parse(text = sb), x)
>    x <- x[sb,]
>    coxph(Surv(times,event)~trt, data = x) }
> 
> subwrap2(testdf, "sex == 'F'")
> 
> # Neither of the above use the coxph subset argument # If I 
> try using that, I get stuck with expressions, # I've tried 
> many # different things in the subset argument, but none # 
> seem to do the trick.  Is using this argument in a # program 
> even advisable?
> 
> subwrap3 <- function(x, sb) {
>    coxph(Surv(times,event)~trt, data = x,
>    subset = eval(substitute(sb), x))
> }
> 
> subwrap3(testdf, sex == 'F') #does not work
> 
> # Using a string, this works, however.
> 
> subwrap4 <- function(x, sb) {
>    coxph(Surv(times,event)~trt, data = x, subset = 
> eval(parse(text=sb))) }
> 
> subwrap4(testdf, "sex == 'F'")
> 
> ### END R SAMPLE CODE
> 
> Thanks so much,
> Erik Iverson
> iverson at biostat.wisc.edu
> 
>  > sessionInfo()
> R version 2.5.1 (2007-06-27)
> i686-pc-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA
> TE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8
> ;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC
> _MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] "grDevices" "datasets"  "tcltk"     "splines"   
> "graphics"  "utils"
> [7] "stats"     "methods"   "base"
> 
> other attached packages:
>         debug     mvbutils SPLOTS_1.2-6        Hmisc        chron 
> survival
>       "1.1.0"      "1.1.1"      "1.2-6"      "3.4-2"     "2.3-13" 
> "2.32"
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>