[R] Nelson-Aalen estimator of cumulative hazard

Mon May 4 15:47:06 CEST 2009

Hi,

I figure out the reason for the difference.  There are ties in failure times
in the data set.  Consequently, it matters which method is used to handle
ties in "coxph". The default is "efron".  If I use method="breslow", there
is no difference between the 2 different ways of computing the Nelson-Aalen
estimte.

require(foreign)

gb <- read.dta("GB.dta")  # Green & Byar data; N = 483

# Method 1 (note the use of Breslow estimator)

fit1 <- coxph( Surv(time, status=="Cancer" | status=="CVD" |
status=="Other") ~ 1, method="breslow", data=gb) 

h1 <- basehaz(fit1)

# Method 2

fit2 <- survfit(Surv(time, status=="Cancer" | status=="CVD" |
status=="Other") ~ 1, data=gb)

jump <- fit2$n.event > 0

h2 <-  cumsum(fit2$n.event[jump]/fit2$n.risk[jump])

	> min(abs(h1$hazard - h2))
	[1] 1.387779e-17

Best,
Ravi. 

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

----------------------------------------------------------------------------
--------

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Ravi Varadhan
Sent: Monday, May 04, 2009 12:33 AM
To: r-help at r-project.org
Subject: [R] Nelson-Aalen estimator of cumulative hazard

Hi,

I am computing the Nelson-Aalen (NA) estimate of baseline cumulative hazard
in two different ways using the "survival" package.  I am expecting that
they should be identical.  However, they are not. Their difference is a
monotonically increasing with time.  This difference is probably not large
to make any impact in the application, but is annoyingly non-trivial for me
to just ignore it.  

This is a competing risks problem, with the Green & Byar (1980) data set
(the STATA data set is attached).

Can anyone explain to me the reason for the discrepancy?

require(foreign)

gb <- read.dta("GB.dta")  # Green & Byar data; N = 483

# Method 1

fit1 <- coxph( Surv(time, status=="Cancer" | status=="CVD" |
status=="Other") ~ 1, data=gb) 

h1 <- basehaz(fit1)

# Method 2

fit2 <- survfit(Surv(time, status=="Cancer" | status=="CVD" |
status=="Other") ~ 1, data=gb)

jump <- fit2$n.event > 0

h2 <-  cumsum(fit2$n.event[jump]/fit2$n.risk[jump])

plot(h1$time, h1$hazard - h2)

Thank you,
Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu