[R] Estimate of baseline hazard in survival

Fri Jun 10 20:47:46 CEST 2005

On Fri, 10 Jun 2005, Hanke, Alex wrote:

> Dear All,
> I'm having just a little terminology problem, relating the language used in
> the Hosmer and Lemeshow text on Applied Survival Analysis to that of the
> help that comes with the survival package.
>
> I am trying to back out the values for the baseline hazard, h_o(t_i), for
> each event time or observation time.
> Now survfit(fit)$surv gives me the value of the  survival function,
> S(t_i|X_i,B), using mean values of the covariates and the coxph() object
> provides me with the estimate of the linear predictors, exp(X'B).
> If S(t_i|X_i,B)=S_o(t_i)^exp(X_iB) is the expression for the survival
> function
> And
> -ln(S_o(t_i) ) is the expression for the cumulative baseline hazard
> function, H_o(t_i)
> Then by rearranging the expression for the survival function I get the
> following:
> -ln(S_o(t_i) ) = -ln( S(t_i|X_i,B) ) / exp(X_iB)
>                   = basehaz(fit)/exp(fit$linear.predictors)
> Am I right so far and is there an easier way?

No, and yes.

You are dividing the centered baseline hazard at each time point by the 
linear predictor for the person who happened to die at that time, rather 
than the linear predictor at the mean covariates.

basehaz(fit, centered=FALSE) will get you the baseline hazard at zero 
covariates.

You don't even need that.  The baseline hazard at zero covariates is 
constant if and only if the centered baseline hazard is constant, so you 
could also work with basehaz(fit), which is often more numerically stable.

> The plot of the cumulative baseline hazard function , H_o(t_i), should be
> linear across time. Once I have, H_o(t_i),   to get at h_o(t_i) I then need
> to reverse the cumsum operation. The corresponding plot should have a
> constant baseline hazard over time.

No. Not at all.

Unless you smooth the h_0(t_i) they are completely useless for what you 
want.

Suppose the hazard rate is constant and you have no covariates in the 
model and not even any censoring. In that case the increments of the 
baseline hazard are 1/n, 1/(n-1), 1/(n-2),..., 1/2, 1, where n is the 
sample size.  So in this simplest possible cause a constant baseline 
hazard rate leads to h_0(t_i) increasing with t.

The proper smoothing is a little tricky, because the failure distribution 
is skewed and has a boundary at zero, and because of censoring.  That's 
why textbooks often recommend graphing the cumulative hazard to see if it 
is linear rather than the increments in the cumulative hazard to see if 
they are constant.

 	-thomas