[R] Slow survfit -- is there a faster alternative?
David Winsemius
dwinsemius at comcast.net
Tue Dec 22 02:59:25 CET 2009
On Dec 21, 2009, at 8:20 PM, <gregory.bronner at barclayscapital.com>
wrote:
> Using R 2.10 on Windows:
>
> I have a filtered database of 650k event observations in a data frame
> with 20+ variables.
>
> I'd like to be able to quickly generate estimate and plot survival
> curves. However the survfit and cph() functions are extremely slow.
>
>
> As an example: I tried
>
> results.cox<-coxph(Surv(duration, success) ~ start_time + factor1+
> factor2+ variable3, data=filteredData) #(took a few seconds)
>
> plot(results.cox)
> #(never finished in an hour)
Something is wrong here. I use cph (from the Design package) on
datasets numbering in the millions with crossed spline terms and the
plots are virtually immediate.
>
> I also tried the cph() function, with similar results.
The plot.Design function needs more than just the fit as an argument,
so you are not providing enough information for good advice. When I
try to plot with an object produced by coxph with no further
arguments, I get an error.
> plot(survHb)
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
What happens when you use predict or survfit to process the fit objects:
?survfit.coxph
E.g.:
> survHb <- coxph(Surv(surv.yr, death) ~ age+nsmkr +sexMF + HbPr2 +
GGT, data=hisub)
> plot(Hb)
Error: object 'Hb' not found
Error in plot(Hb) :
error in evaluating the argument 'x' in selecting a method for
function 'plot'
#this however, succeeds --->
> sfit<-survfit(survHb)
> plot(sfit)
So you need to supply a process form of a fit.
>
>
> Is there some easier quick-and-dirty way of producing and plotting
> survival curves for large data sets? I've seen some references on this
> list that suggest that the underlying algorithm is O(numObs *
> numSuccesses) and could be sped up. Has this been done?
>
> Thanks,
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list