[R] Discrepancy in the PBC data set
Terry Therneau
therneau at mayo.edu
Mon Nov 24 14:39:49 CET 2008
The data set in R is wrong. I've found mistakes on 2 lines in a quick look.
I don't know if the data is incorrect in the Appendix of Fleming and
Harrington as well (someone seems to have borrowed my copy), which is where the
data set appears to have been taken from, given all the "-9" codes in it. (Note,
Tom Fleming originally got the data from me, so I'm fairly confident in calling
my Mayo version the authoritative one). I'll make sure this gets fixed.
You can grab a correct data set from our department web page. Code is below.
Terry Therneau
pbcurl <-
"http://mayoresearch.mayo.edu/mayo/research/biostat/upload/therneau_upload/pbc.d
at"
pbc <- read.table(pbcurl, header=F,
col.names=c('id', 'time', 'status', 'trt', 'age', 'sex',
'ascites', 'hepato', 'spiders', 'edema',
'bili', 'chol', 'albumin', 'copper',
'alk.phos', 'ast', 'trig', 'platelet',
'protime', 'stage'),
na.strings='.')
pbc$age <- pbc$age/365.25
newfit <- coxph(Surv(time, status==2) ~ age + edema + log(bili) +
log(protime) + log(albumin), data=pbc)
newfit
coef exp(coef) se(coef) z p
age 0.0396 1.0404 0.00767 5.16 2.4e-07
edema 0.8963 2.4505 0.27141 3.30 9.6e-04
log(bili) 0.8636 2.3716 0.08294 10.41 0.0e+00
log(protime) 2.3868 10.8791 0.76851 3.11 1.9e-03
log(albumin) -2.5069 0.0815 0.65292 -3.84 1.2e-04
Likelihood ratio test=231 on 5 df, p=0 n=416 (2 observations deleted due to
missingness)
More information about the R-help
mailing list