[R] Expected number of events, Andersen-Gill model fit via coxph in package survival
Terry Therneau
therneau at mayo.edu
Mon Oct 8 15:58:31 CEST 2012
> I am interested in producing the expected number of events, in a
> recurring events setting. I am using the Andersen-Gill model, as fit
> by the function "coxph" in the package "survival."
>
> I need to produce expected numbers of events for a cohort,
> cumulatively, at several fixed times. My ultimate goal is: To fit an
> AG model to a reference sample, then use that fitted model to generate
> expected numbers of events for a new cohort; then, comparing the
> expected vs. the observed numbers of events would give us some idea of
> whether the new cohort differs from the reference one.
>
>> From my reading of the documentation and the text by Therneau and
> Grambsch, it seems that the function "survexp" is what I need. But
> using it I am not able to obtain expected numbers of events that match
> reasonably well the observed numbers *even for the same reference
> population.* So, I think I am misunderstanding something quite badly.
>
You've hit a common confusion. Observed versus expected events computations are done on
a cumulative hazard scale H, not the surivival scale S; S = exp(-H). Relating this back
to simple Poisson models H(t) would be the expected number of events by time t and S(t)
the probability of "no events before time t". G. Berry (Biometrics 1983) has a classic
ane readable article on this (especially if you ignore the proofs).
Using your example:
> cphfit <- coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
> zz <- predict(cphfit, type='expected')
> c(sum(zz), sum(bladder2$event))
[1] 112 112
> tdata <- bladder2[1:10] #new data set (lazy way)
> predict(cphfit, type='expected', newdata=tdata)
[1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665
[8] 0.8864808 0.2932915 0.5190647
You can also do this using survexp and the cohort=FALSE argument, which would return
S(t) for each subject and we would then use -log(result) to get H. This is how it was
done when I wrote the book, but the newer predict function is easier.
Terry Therneau
More information about the R-help
mailing list