[R] coxph data format

Ehsan Karim wildscop at hotmail.com
Tue May 8 05:26:09 CEST 2012


Dear List,

Here is an example of survival data in counting process format
(detailed record of each day)

> data[data$Id == 11,]
# extracted one person's record
    Id Event Fup Start Stop sex Drug1
601 11     0   6     0    1   0     0
602 11     0   6     1    2   0     0
603 11     0   6     2    3   0     0
604 11     0   6     3    4   0     0
605 11     0   6     4    5   0     1
606 11     1   6     5    6   0     1

which is compressed in the following format (unchanged records of drug
exposure merged):

> compressed.data[compressed.data$Id ==11,]
# compressed same person's record
   Id Event Fup Start Stop sex Drug1
21 11     0   6     0    4   0     0
22 11     1   6     4    6   0     1

My question is: since the provided information is the same, should I
expect numerically exactly same results from the following coxph
outputs? If no, then which format is recommended?

> data <- read.csv("http://stat.ubc.ca/~e.karim/dd.csv")
> compressed.data <- read.csv("http://stat.ubc.ca/~e.karim/cd.csv")
> head(data)
> head(compressed.data)
> coef(coxph(Surv(Start, Stop, Event) ~ sex + Drug1 + cluster(Id), robust = T, data))
      sex     Drug1
0.8696213 3.1755854
> coef(coxph(Surv(Start, Stop, Event) ~ sex + Drug1 + cluster(Id), robust = T, compressed.data))
      sex     Drug1
0.8674742 2.7147013

PS: discrete time analogue to Cox's (using cloglog link) also gives
similar results corresponding to the dataset chosen.

Any suggestions/references/direction to R-package will be highly appreciated.

Thanks,

Ehsan



More information about the R-help mailing list