[R] smooth non cumulative baseline hazard in Cox model

Mon Jul 5 03:22:07 CEST 2004

Thank you all for your quick answers.

With respect to my question on smooth noncumulative baseline cox hazard, I
followed Prof Brian Ripley and I used the following:

library(survival)
plot(basehaz(coxfinal2)[,2]/365.25+1945,basehaz(coxfinal2)[,1],t="l")
xx <-
seq(min(basehaz(coxfinal2)[,2]/365.25+1945),max(basehaz(coxfinal2)[,2]/365.2
5+1945),length=100) #my start value was 1st january 1945
library(pspline)
lines(xx,
predict(sm.spline(x=basehaz(coxfinal2)[,2]/365.25+1945,y=basehaz(coxfinal2)[
,1],norder=2), xarg=xx,nderiv=1))

it might seem that computing the derivative when time is expressed in years
gives the annual probability of event.
The previous commands give a graphic exactly identical to:

plot(basehaz(coxfinal2)[,2],basehaz(coxfinal2)[,1],t="l")
xx <-
seq(min(basehaz(coxfinal2)[,2]),max(basehaz(coxfinal2)[,2]),length=100)
lines(xx,
365.25*predict(sm.spline(x=basehaz(coxfinal2)[,2],y=basehaz(coxfinal2)[,1],n
order=2), xarg=xx,nderiv=1))  # [second command]

However, if p is the probability of event for the 1st day of a given year,
it is not obvious to me
that the probability that there is one event for the 1st year equals 365*p.
Am I mistaken? If no, what does the second command computes?

So if someone can help me say what is the time unit for the risk shown by
lines(xx,
predict(sm.spline(x=basehaz(coxfinal2)[,2]/365.25+1945,y=basehaz(coxfinal2)[
,1],norder=2), xarg=xx,nderiv=1))
...

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

With respect to censoring, I think we all agree:

Peter Dalgaard wrote:
> Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>
> > > I'm doing the same job as Hegre et al. (studying civil wars) but with
the
> > > counting process formulation of the Cox model. (I use intervals, my
formula
> > > looks like Surv(start,stop,status)~  etc.).
> >
> > Careful, that is left- and right- censored, not intervals.  Surv has a
> > type= argument.
>
> Nitpick: That's left-*truncated* and right-censored (the status refers
> to the condition at the right end, people who die before the start are
> not registered at all).

I use the following dataset:
id    start    stop    status  ... covariates
1    1    365    0    ...
1    365    400    1    ... [the war starts at 400 and ens at 550]
1    550    730    0    ... [there are possibly repeated events so the
country re-enters the study]
2    1    365    0    ...
2    365    730    0    ...
etc...
where there is one id for every country, that is several lines for each
country (each line thus representing an "interval" of time).

with
coxph(Surv(start, stop, status, type = "interval") ~ x1+...+cluster(id)

I did not meant interval censoring (althought I think it is present here for
country 1 from time 400 to 550), I meant "interval" in the same meaning as
in the R help for Surv:
"time2ending
time of the interval for interval censored or counting process data only.
Intervals are assumed to be open on the left and closed on the right,
(start, end]. For counting process data, event indicates whether an event
occurred at the end of the interval."
"Surv has a  type= argument." Yes, and the help says "The default is "right"
or "counting" depending on whether the time2 argument is absent or present,
respectively." Here, I omited the type, which means I used a counting
process.

Thus, the union of all intervals for country 2 (here, lines 4 and 5) lead to
one big interval which is left truncated and right censored.

Anyway, I think there is no ambiguity, since if one tries     type =
"interval"   it says:
Error in coxph(Surv(start, stop, status, type = "interval") ~ ....
 Cox model doesn't support "interval" survival data

But thanks to Prof. Ripley for the comment, as I am not fully aware of the
exact terminology in English.

Regards,

Mayeul KAUFFMANN