[R] plotting survival curves (multiple curves on single graph)

Dennis Murphy djmuser at gmail.com
Wed Jul 6 02:29:18 CEST 2011


Here's one way to put all the plots in one graph using ggplot2 and a
couple of tricks using the plyr package. You could take the data frame
I generate below and use it as input to lattice graphics if you
prefer. For groupwise plots, either as an ensemble or as separate
panels, these packages are usually less work than an equivalent base
graphic. OTOH, if you want a graphic to be 'just so', it may be better
to use base graphics since both ggplot2 and lattice make certain
design decisions that are not always trivial to override. I concur
with Bert's comments re lattice and ggplot2, though.

The first step is to produce a suitable data frame. To generate the
Siler curves for each group, one can use the mapply() function in base
R or the mdply() function in the plyr package. I chose the latter, but
some post-processing is necessary. Once the curves are obtained, the
next step is to generate a data frame that contains the curves which
will subsequently be used as input to render the curves by group.

library(ggplot2)    # also loads package plyr
time <- 0L:90L
# Function to generate a Siler survival curve
siler <- function(a1, b1, a2, a3, b3) {
  exp(-a1/b1 * (1 - exp(-b1 * time))- a2 * time +
               a3/b3 * (1 - exp(b3 * time)))
# See explanation below
silerCurves <- as.vector(t(mdply(hazanth[, 3:7],  siler)[, 6:96]))

# Generate a data frame with the populations, times and curve values
silerdat <- data.frame(pop = rep(hazanth[['pop']], each = length(time)),
                       time = rep(time, nrow(hazanth)),
                       surv = silerCurves)

# Basic form of graph:
ggplot(silerdat, aes(x = time, y = surv, colour = pop)) +
     geom_line(size = 1) + xlab('Years') + ylab('S(t)')

To get the Siler curves, I used that part of the hazanth data frame
containing the parameters that determine a curve. The mdply() function
operates rowwise, and substitutes the values of the parameters into
the calling function (siler) to generate each curve.
(The order of the parameters in the input data frame have to be
exactly the same as those in the calling function.) The result is a 6
x 96 data frame, but the first five columns are the input parameters
which are no longer needed; moreover, we want to transpose the result
(t) and then stack the columns one below the other (i.e., the first
set of curve values is on top, the last on the bottom), which is what
as.vector() does. The new data frame silerdat then associates a
population with each curve and matches the times corresponding to the
curve values. This data frame is then input into ggplot().

If ggplot2 is new to you, the basic idea is that a plot is assembled
incrementally in 'layers'. The '+' sign indicates that a new layer is
being added to the existing plot. 'Geoms' represent different
geometric elements of a plot; in this case,  geom_smooth() is a
reasonable choice to fit a smooth curve through the (time, surv)
pairs. Color is an 'aesthetic' that is used here to distinguish among
the various populations. A default legend is produced with the plot,
which can be optionally repositioned or modified.

The on-line help pages for ggplot2 are found at
http://had.co.nz/ggplot2/  Scroll down to 'Reference manual'. A freely
available chapter of the ggplot2 book is a useful way to get started
with the package; see the 'ggplot book' heading on the same page and
click on the book website to gain access to it.

Similar graphs using the lattice package are obtained as follows, with
a couple of choices of legend:

# Legend inside the plot region
mykey <- list( corner = c(1, 1),
               title = 'Population',
               cex.title = 1.1,
               text = list(levels(silerdat$pop), cex = 0.8),
               lines = list(col = 1:6, lwd = 2))
xyplot(surv ~ time, data = silerdat, groups = pop, type = 'l', lwd = 2,
        col.line = 1:6, key = mykey, xlab = 'Years', ylab = 'S(t)')

# Legend outside the plot region (similar to the ggplot() above)
mykey2 <- list( space = 'right',
               title = 'Population',
               cex.title = 1.1,
               text = list(levels(silerdat$pop), cex = 0.8),
               lines = list(col = 1:6, lwd = 2))
xyplot(surv ~ time, data = silerdat, groups = pop, type = 'l', lwd = 2,
        col.line = 1:6, key = mykey2, xlab = 'Years', ylab = 'S(t)')


On Tue, Jul 5, 2011 at 3:08 PM, Trey Batey <ekt.batey at gmail.com> wrote:
> Hello.
> This is a follow-up to a question I posted last week.  With some
> previous suggestions from the R-help community, I have been able to
> plot survival (, hazard, and density) curves using published data for
> Siler hazard parameters from a number of ethnographic populations.
> Can the function below be modified, perhaps with a "for" statement, so
> that multiple curves (different line types---one for each population)
> are plotted on a single graph for comparison?  Thanks so much.
> --Trey
> The function and calls below use the data in this Excel file (feel
> free to access):
> https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US
> ## - plot Siler survival curve
> ##############################
> silsurv<-function(a1,b1,a2,a3,b3)
>  {
>    sil=function(t)
>      {
>        h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t)
>        S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t)))
>        d.t<-S.t*h.t
>        #return(d.t)
>        return(S.t)
>        #return(h.t)
>      }
>    t<-seq(0,90,1)
> plot(t,sil(t),ylim=c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age
> (years)')
>  }
> with(hazanth[1,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9)
>  # plot for Hadza
> with(hazanth[2,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9)
>  # plot for Ache
> with(hazanth[3,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9)
>  # plot for Hiwi
> with(hazanth[4,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9)
>  # plot for !Kung
> with(hazanth[5,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9)
>  # plot for Yanomamo
> with(hazanth[6,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9)
>  # plot for Tsimane
> ###############################
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list