[R] How to use the whole dataset (including between events) in Cox model (time-varying covariates) ?

Mayeul KAUFFMANN mayeul.kauffmann at tiscali.fr
Fri Aug 13 16:00:22 CEST 2004

> > coxph does not use any information that are in the dataset between
> > times (or "death times") , since computation only occurs at event
> This is the consequence of the use of partial likelihood in the Cox
>You need to make more assumptions, such as a
>smooth baseline hazard, and you can always use parametric models and a
>full likelihood (but you may have to program them yourself).
> Brian D. Ripley

If I'm not wrong, another alternative  might be to use explicitely a
Poisson model (following Dickman et al., who propose another method of
fitting a model close to Cox's model. Reference at the end.).

So my question is on R syntax (sorry if it is a naive question):
does the following do the job I think it does (full likelihood - using
full dataset - for a model close to Cox's) ?

summary(glm(formula = status ~ x1+x2+offset(log((stop-start)/365.25 )),
family = poisson(link = log), na.action =na.omit, control = list(epsilon =
0.001, maxit = 50, trace = F),data=Xstep2))

Dickman et al. recommand the offset ln(yj)  where yj is person-time at
risk for the observation. start and stop are days in my dataset. About two
thirds of my observations are one-year long. status= 1 for event, 0 for
censored observations.
The ten "coef" estimated with coxph are nearly the same as the ten
"estimate" with glm, the p-values are close. (The covariate for which it
is not the case is a covariate which changes very quickly, and thus may be
badly measured with partial likelihood, with computation only at death

I think the baseline hazard is constant here. Dickman et al. use  link
ln(muj ?d*j ) where d*j is the known baseline hazard for observation/at
time j. They say:
 "d*j is the expected number of deaths (due to causes other than the
cancer of interest and
estimated from general population mortality rates") [...]  Fitting the
model requires software which supports the estimation of generalized
linear models with the so-called user-defined link functions. Most general
purpose statistical software packages support this feature, including SAS
(from version 6.10), Stata (from version 7), S-plus, R and GLIM."").

First, I do not know how to specify such a link function in R.
Second, if I can specify such alink, I could use (in place of d*j), the
smooth baseline estimated after doing a Cox regression. But I don't know
how to fit (for instance) a piecewise constant baseline hazard with a
Poison glm, except trying all possible models (within a given class) with
a for( ) loop and taking the highest loglikelihood.

Thank you a lot for any help.

Univ. Pierre Mendes France
Grenoble - France

Dickman;Sloggett, Hills, Hakulinen, "Regression models for relative
survival",  Statist. Med. 2004; 23:51-64 (DOI: 10.1002/sim.1597)
available at

they say:
The underlying model is an additive hazards model where the total hazard
is written as the sum of the known baseline hazard and the excess hazard
associated with a diagnosis of cancer.
"We assume that the number of deaths, dj , for observation j can be
described by a Poisson
distribution, dj follows Poisson(muj) where muj =lambdaj.yj  and yj is
person-time at risk for the observation.
The observations can represent either [...] individual patients or subject
bands (as in Section 3)."
fiting the model requires software which supports

----- Original Message ----- 
From: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>
To: "Mayeul KAUFFMANN" <mayeul.kauffmann at tiscali.fr>
Cc: <r-help at stat.math.ethz.ch>
Sent: Friday, August 13, 2004 8:40 AM
Subject: Re: [R] How to use the whole dataset (including between events)
in Cox model (time-varying covariates) ?

More information about the R-help mailing list