[R] survSplit: further exploration and related topics

Tue Nov 9 15:01:36 CET 2004

To Danardonos concern of splitting time for records with delayed entry:

This can fairly easily be accomodated, by simply splitting time in small
intervals of time since entry into the study, and then compute the value
of the other timescales for each of these e.g.:

current.age <- time.from.entry + age.at.entry

but the cut on the other timescales will not be exactly where you may
want 
them to be, but this should be of minor importance in real life.

This approach will also make it clearer what really is going on.

The effect of each of the timescales can then be modelled using the
usual
regression tools available in R.

David Clayton har written an R-function that does it correctly, you can
find it in:
http://biostat.ku.dk/~bxc/Lexis/ along with its .Rd file.

It is also included a Lexis-package which is a first shot at an
epidemiology package 
for R available at http://biostat.ku.dk/~bxc/SPE/library/, but built
under 1.9.
I recently tried to build it under 2.0, but it crashed for me and I was
advised to
wait till 2.1 before I had another go at it.

A little further exploration of what goes on in survSplit gives:
> data(aml)
> aml$id <- letters[1:dim(aml)[1]]
> aml3<-survSplit(aml,cut=c(5,10,50),end="time",start="start",
+                     event="status",episode="i")
> # Sort the rows sensibly
> aml3 <- aml3[order(aml3$id,aml3$start),]
> aml3$expand <- 1:(dim(aml3)[1])
> aml3[aml3$id=="k",c("id","expand","start","time","status","x")]
   id expand start time status          x
11  k     30     0    5      0 Maintained
34  k     31     5   10      0 Maintained
57  k     32    10   50      0 Maintained
80  k     33    50  161      0 Maintained
> 
> # All records are taken as if entry were at time 0:
> aml4<-survSplit(aml3,cut=c(9,12,40),end="time",start="entry",
+                      event="status",episode="i",id="id2")
> aml4 <- aml4[order(aml4$id,aml4$expand,aml4$start),]
>
aml4[aml4$id=="k",c("id","expand","start","entry","time","status","x")]
    id expand start entry time status          x
30   k     30     0     0    5      0 Maintained
31   k     31     5     0    9      0 Maintained
94   k     31     5     9   10      0 Maintained
32   k     32    10     0    9      0 Maintained
95   k     32    10     9   12      0 Maintained
158  k     32    10    12   40      0 Maintained
221  k     32    10    40   50      0 Maintained
33   k     33    50     0    9      0 Maintained
96   k     33    50     9   12      0 Maintained
159  k     33    50    12   40      0 Maintained
222  k     33    50    40  161      0 Maintained
> 
> # entry time is is "start", but it seems that it is more or less
> # assumed that entry is at the first split point given.
> aml5<-survSplit(aml3,cut=c(9,12,40),end="time",start="start",
+                      event="status",episode="i",id="id2")
> aml5 <- aml5[order(aml5$id,aml5$expand,aml5$start),]
> aml5[aml5$id=="k",c("id","expand","start","time","status","x")]
    id expand start time status          x
30   k     30     0    5      0 Maintained
31   k     31     5    9      0 Maintained
94   k     31     9   10      0 Maintained
95   k     32     9   12      0 Maintained
158  k     32    12   40      0 Maintained
221  k     32    40   50      0 Maintained
96   k     33     9   12      0 Maintained
159  k     33    12   40      0 Maintained
222  k     33    40  161      0 Maintained
> 

It appears that the intention has been to support counting process
input, 
but not quite succeeded.

Bendix Carstensen
----------------------
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2
DK-2820 Gentofte
Denmark
tel: +45 44 43 87 38
mob: +45 30 75 87 38
fax: +45 44 43 07 06
bxc at steno.dk
www.biostat.ku.dk/~bxc
----------------------

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Thomas Lumley
> Sent: Monday, November 08, 2004 4:28 PM
> To: Danardono
> Cc: R-help at stat.math.ethz.ch
> Subject: Re: [R] survSplit
> 
> 
> On Mon, 8 Nov 2004, Danardono wrote:
> 
> > I am just realized that  survival has the facility to do 
> survival time
> > splitting survSplit
> > after read some postings about  time dependency  in the list.
> > Is it survSplit only for the survival data input 
> (time,status)  and not for 
> > the 'counting process' input (start,stop,status)?
> >
> > I take one example modified from the survSplit help:
> >> data(aml)
> >> 
> aml3<-survSplit(aml,cut=c(5,10,50),end="time",start="start",event="st
> >> atus",episode="i",id="id")
> >
> >> coxph(Surv(time,status)~x,data=aml)
> >
> >               coef exp(coef) se(coef)    z     p
> > xNonmaintained 0.916       2.5    0.512 1.79 0.074
> >
> > Likelihood ratio test=3.38  on 1 df, p=0.0658  n= 23
> >
> > #the same
> >> coxph(Surv(time,status)~x,data=aml3)
> >
> >               coef exp(coef) se(coef)    z     p
> > xNonmaintained 0.916       2.5    0.512 1.79 0.074
> >
> > Likelihood ratio test=3.38  on 1 df, p=0.0658  n= 63
> 
> This should NOT be the same, and on my computer is not.  You 
> should have 
> to use the counting-process syntax.  I get
> 
> > coxph(Surv(time,status)~x,data=aml3)
> Call:
> coxph(formula = Surv(time, status) ~ x, data = aml3)
> 
> 
>                 coef exp(coef) se(coef)    z     p
> xNonmaintained 1.07      2.92    0.514 2.08 0.037
> 
> Likelihood ratio test=4.61  on 1 df, p=0.0317  n= 63
> > coxph(Surv(start,time,status)~x,data=aml3)
> Call:
> coxph(formula = Surv(start, time, status) ~ x, data = aml3)
> 
> 
>                  coef exp(coef) se(coef)    z     p
> xNonmaintained 0.916       2.5    0.512 1.79 0.074
> 
> Likelihood ratio test=3.38  on 1 df, p=0.0658  n= 63
> 
> 
> 
> > BUT If I split aml3 further:
> >
> >> 
> aml4<-survSplit(aml3,cut=c(9,12,40),end="time",start="start",event="s
> >> tatus",episode="i",id="id2")
> > #not the same!
> >> coxph(Surv(start,time,status)~x,data=aml4)
> >              coef exp(coef) se(coef)    z     p
> > xNonmaintained 1.05      2.85    0.515 2.03 0.042
> >
> > Likelihood ratio test=4.38  on 1 df, p=0.0363  n= 105
> >
> 
> Hmm.  I suspect this is a <= vs < bug of some sort in handling start 
> times.
> 
>  	-thomas
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read 
> the posting guide! http://www.R-project.org/posting-guide.html
>