LTRCtrees package is designed to fit survival trees for left-truncated and right censored (LTRC) data as described in
Fu and Simonoff (2017a), and interval-censored survival data as described in
Fu and Simonoff (2017b). For right censored data, the tree algorithms in
partykit, respectively, can be used to fit a survival tree. The
LTRCtrees package extends the survival tree algorithms in these two packages to fit trees for LTRC data, as well as interval-censored survival data. Since the package uses the fitting functions in
partykit, these packages will be installed if they have not already been installed on the user’s machine, but they are not loaded into the session when the
LTRCtrees package is loaded. In addition to fitting trees to LTRC data, a major benefit of these extensions is that they can be used to fit survival trees for data with time-varying covariates.
This document gives several examples of how to use the LTRC trees and ICtree (interval-censored tree) implemented in the
The assay of serum free light chain data for 7874 subjects in the R package
survival is used as a data example to show how to fit survival trees for LTRC data. The original data were obtained from the residents of Olmsted County aged 50 or greater, and therefore the data are left-truncated at age 50. The original analysis of this data set was based on the Cox model with age as a covariate and time after enrollment in the study as response. However, in a mortality study such as this one, since greater age is almost always associated with higher risk of death, it is not that meaningful (or surprising) to have age as a (significant) covariate. Also, it is often the case that the real response of interest should be the subject’s life length, not the time from enrollment in the study to death/censoring.
In this example, we use age as a left-truncation point and death/censored time as the event time.
## Adjust data & clean data library(survival) set.seed(0) ## Since LTRCART uses cross-validation to prune the tree, specifying the seed ## guarantees that the results given here will be duplicated in other analyses Data <- flchain Data <- Data[!is.na(Data$creatinine),] Data$End <- Data$age + Data$futime/365 DATA <- Data[Data$End > Data$age,] names(DATA) <- "FLC"
The key here is to have a
(start, stop, event) triplet in the data, where
start is the left-truncation time,
stop is the right censored/event time and
event is the event indicator, where 1 means death and 0 means censored. In our case, the triplet is
(age, End, death).
Here we set up a training set and a test set to illustrate predictions using the fitted trees.
## Loading required namespace: inum
## Loading required package: rpart
Note that the terminal node of the LTRCART tree (left panel) displays the relative risk on that node (relative to an estimated baseline hazard), while the terminal node of the LTRCIT tree (right panel) displays the fitted Kaplan-Meier curve for the observations in that node. One can also plot Kaplan-Meier curves on terminal nodes of the LTRCART tree by converting it to
## Loading required package: grid
## Loading required package: libcoin
## Loading required package: mvtnorm
Since the class of an LTRCIT object is
party, the prediction method on an LTRCIT object is the same as those for any trees of class
party using the
## 1057 1058 1059 1060 1061 1062 ## 90.47123 90.47123 90.47123 90.47123 90.47123 83.41644
## $`1057` ## Call: survfit(formula = y ~ 1, weights = w, subset = w > 0) ## ## records n.max n.start events median 0.95LCL 0.95UCL ## 248.0 144.0 36.0 189.0 90.5 88.4 91.4 ## ## $`1058` ## Call: survfit(formula = y ~ 1, weights = w, subset = w > 0) ## ## records n.max n.start events median 0.95LCL 0.95UCL ## 248.0 144.0 36.0 189.0 90.5 88.4 91.4
The prediction function on a survival
rpart object can only predict the relative risk, so we can only predict the relative risk on an LTRCART object.
## 1057 1058 1059 1060 1061 1062 ## 0.7212083 0.7212083 0.7212083 0.7212083 0.7212083 1.5257057
In order to predict the median survival time and Kaplan-Meier curves based on an LTRCART object, one can use the
Pred.rpart function available in the
## [] ## Call: survfit(formula = Formula, data = subset) ## ## records n.max n.start events median 0.95LCL 0.95UCL ## 248.0 144.0 36.0 189.0 90.5 88.4 91.4 ## ## [] ## Call: survfit(formula = Formula, data = subset) ## ## records n.max n.start events median 0.95LCL 0.95UCL ## 248.0 144.0 36.0 189.0 90.5 88.4 91.4
##  90.5 90.5 90.5 90.5 90.5 83.8
Note that in order to use the
Pred.rpart function one needs to pass both the Training and Test sets into the function. If the formula passed to
Pred.rpart has the form Surv(time1, time2, event), then the function recognizes this as LTRC data and LTRCART is called internally; if the formula has the form Surv(time, event), then the function recognizes this as ordinary right censored data, and the
rpart function in the
rpart package is called internally.
To fit time-varying covariates survival trees using
LTRCIT, data need to be prepared/transformed into the so-called Andersen-Gill style. This is exactly the same data format needed to fit the Cox model with time-varying covariates.
This table gives an example of the form of Andersen-Gill style data. Information for two patients is presented in this table, with Sex being a time-independent covariate and Blood pressure being a time-varying covariate. For the first patient (patient.ID=1), the event-death is observed at time 27, while the 3 measurements of Sex and blood pressure are recorded at the beginning and time 10 and 20 respectively. Information for the second patient has a similar interpretation except he is censored at time 19.
Note that each row represents a time interval, where within each interval all covariates are assumed to have constant values. The first row means the first patient is a female with blood pressure equal to 100 during the interval (0,10]. Of course, blood pressure is unlikely to keep at a fixed level during this time interval (since it is time-varying), but in practice a patient’s blood pressure is measured only at the time he/she visits the clinic, so it is common practice to treat it as constant between the two visits.
Note that as a result each row has the form of LTRC data – left-truncated at Start and right-censored or experience event at End, with Death being the event indicator.
pbcseq dataset in the
survival package is used as an example to illustrate fitting survival trees with time-varying covariates. These data were obtained from 312 patients with primary biliary cirrhosis (PBC) enrolled in a double-blind, placebo-controlled, randomized trial conducted between January, 1974 and May, 1984 at the Mayo Clinic to evaluate the use of D-penicillamine for treating PBC. A comprehensive clinical and laboratory database was established on each patient. Follow-up was extended to April, 1988, by which time 140 of the patients had died and 29 had undergone orthotopic liver transplantation. These patients generated 1,945 patient visits that enable us to study the change in the prognostic variables of PBC.
set.seed(0) library(survival) ## Create the start-stop-event triplet needed for coxph and LTRC trees first <- with(pbcseq, c(TRUE, diff(id) !=0)) #first id for each subject last <- c(first[-1], TRUE) #last id time1 <- with(pbcseq, ifelse(first, 0, day)) time2 <- with(pbcseq, ifelse(last, futime, c(day[-1], 0))) event <- with(pbcseq, ifelse(last, status, 0)) event <- 1*(event==2) pbcseq$time1 <- time1 pbcseq$time2 <- time2 pbcseq$event <- event ## Fit the Cox model and LTRC trees with time-varying covariates fit.cox <- coxph(Surv(time1, time2, event) ~ age + sex + log(bili), pbcseq) LTRCIT.fit <- LTRCIT(Surv(time1, time2, event) ~ age + sex + log(bili), pbcseq) LTRCART.fit <- LTRCART(Surv(time1, time2, event) ~ age + sex + log(bili), pbcseq) ## Result of the Cox model with time-varying covariates fit.cox
## Call: ## coxph(formula = Surv(time1, time2, event) ~ age + sex + log(bili), ## data = pbcseq) ## ## coef exp(coef) se(coef) z p ## age 0.068285 1.070670 0.008624 7.918 2.42e-15 ## sexf 0.182856 1.200641 0.234190 0.781 0.435 ## log(bili) 1.465486 4.329646 0.094157 15.564 < 2e-16 ## ## Likelihood ratio test=352.8 on 3 df, p=< 2.2e-16 ## n= 1945, number of events= 140
Note that both survival trees fitted using LTRCART and LTRCIT, respectively, identify
log(bili) as risk factors, which is consistent with the Cox model result.
In the previous example, the
pbcseq data are in the long format – each row represents a start-stop-event triplet, which contains the information in the time interval (start, stop]. Note that all covariates have constant values inside each such interval, while each subject could have multiple rows (multiple intervals) to represent the time-varying covariates effect, i.e. different rows (intervals) of the same subject contain different covariates’ value. The long format data can be directly used to fit survival trees with time-varying covariates using either LTRCART or LTRCIT.
It is common to have time-varying covariates survival data that is in wide format – each row contains all of the information of one subject. To use the tree algorithms in
LTRCtrees, we first need to transform the wide format into the long format. The
tmerge function in the
survival package can be used to serve this purpose.
library(survival) ### transform the wide format data into the long format data using tmerge function ### from survival package on Stanford Heart Transplant data jasa$subject <- 1:nrow(jasa) tdata <- with(jasa, data.frame(subject = subject, futime= pmax(.5, fu.date - accept.dt), txtime= ifelse(tx.date== fu.date, (tx.date -accept.dt) -.5, (tx.date - accept.dt)), fustat = fustat)) sdata <- tmerge(jasa, tdata, id=subject,death = event(futime, fustat),trt = tdc(txtime), options= list(idname="subject")) sdata$age <- sdata$age - 48 sdata$year <- as.numeric(sdata$accept.dt - as.Date("1967-10-01"))/365.25 Cox.fit <- coxph(Surv(tstart, tstop, death) ~ age+ surgery, data= sdata) LTRCART.fit <- LTRCART(Surv(tstart, tstop, death) ~ age + transplant, data = sdata) LTRCIT.fit <- LTRCIT(Surv(tstart, tstop, death) ~ age + transplant, data = sdata) ## results Cox.fit
## Call: ## coxph(formula = Surv(tstart, tstop, death) ~ age + surgery, data = sdata) ## ## coef exp(coef) se(coef) z p ## age 0.03068 1.03115 0.01364 2.25 0.0245 ## surgery -0.77284 0.46170 0.35953 -2.15 0.0316 ## ## Likelihood ratio test=10.72 on 2 df, p=0.00471 ## n= 170, number of events= 75
## Loading required package: perm
## Loading required package: Icens
## Loading required package: MLEcens
## Depends on Icens package available on bioconductor. ## To install use for example: ## install.packages('BiocManager') ## BiocManager::install('Icens')
## ans < 0 || ans > 1. t_diff = 12.000000, pStep = 0.055206, intLen = 12.000000, ind = 19, k = 20
 Fu, W. and Simonoff, J.S. (2017a). Survival trees for left-truncated and right-censored data, with application to time-varying covariate data. Biostatistics 18 (2), 352-369
 Fu, W. and Simonoff, J.S. (2017b). Survival trees for Interval Censored Survival data. Statistics in medicine 36 (30), 4831-4842
 Therneau, T., Atkinson, B., Ripley, B. and Ripley, M.B., (2015). rpart: Recursive Partitioning and Regression Trees.
 Hothorn, T. and Zeileis, A., (2015). partykit: A Modular Toolkit for Recursive Partytioning in R.
 Therneau, T., (2015). survival: A Package for Survival Analysis in S. version 2.38.
 Milborrow, S., (2011). rpart.plot: Plot ‘rpart’ models: An Enhanced Version of ‘plot.rpart’.