[R] Expected number of events, Andersen-Gill model fit via coxph in package survival
Omar De la Cruz C.
odelacruzc at gmail.com
Sat Oct 6 05:48:41 CEST 2012
Hello,
I am interested in producing the expected number of events, in a
recurring events setting. I am using the Andersen-Gill model, as fit
by the function "coxph" in the package "survival."
I need to produce expected numbers of events for a cohort,
cumulatively, at several fixed times. My ultimate goal is: To fit an
AG model to a reference sample, then use that fitted model to generate
expected numbers of events for a new cohort; then, comparing the
expected vs. the observed numbers of events would give us some idea of
whether the new cohort differs from the reference one.
>From my reading of the documentation and the text by Therneau and
Grambsch, it seems that the function "survexp" is what I need. But
using it I am not able to obtain expected numbers of events that match
reasonably well the observed numbers *even for the same reference
population.* So, I think I am misunderstanding something quite badly.
Below is an example that illustrates the situation. At the end I
include the sessionInfo().
Thank you!
Omar.
############################
# Example of unexpected behavior in computing estimated number of events
# in using package "survival" for fitting the Andersen-Gill model
require(survival)
head(bladder2) # this is the data, in interval format
# Fit Andersen-Gill model
cphfit = coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
# Choose some arbitrary time horizons
t.horiz = seq(min(bladder2$start),max(bladder2$stop),length=6)
# Compute the cohort expected survival
s = survexp(~1,data=bladder2,ratetable=cphfit,times=t.horiz)
# This are the expected survival values:
s$surv
# We are interested in the rate of events
e.r = as.vector( 1 - s$surv )
# How does this compare to the actual number of events, cumulative at
# each time horizon?
observed = numeric(length(t.horiz))
for (i in 1:length(t.horiz)){
observed[i] = sum(bladder2$event[bladder2$stop <= t.horiz[i]])
}
print(observed)
# We would like to compute expected numbers of events that approximately
# match these observed values.
# We should multiply the expected survival rates by the number of individuals.
# Now, one would think that this is the number of at-risk individuals:
s$n.risk
# But that is actually the total number of rows in the data. In any case,
# these numbers do not match:
rbind(expected = s$n.risk*e.r,observed=observed)
# What if we multiply by the number of individuals?
rbind(expected = length(unique(bladder2$id))*e.r,observed=observed)
# This does not work either! The required factor seems to be about 133, but
# I don't see an explanation for that.
# In this example, multiplying by 133.182 gives a good match between observed
# and expected values, but in other examples even the shape of the curves
# are different.
# Multiplying by a number of individuals at risk at each time point
# (number of individuals
# for which there is a time interval containing the time horizon) does
# not work either.
#####################
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets
methods base
other attached packages:
[1] survival_2.36-14
loaded via a namespace (and not attached):
[1] tools_2.15.1
More information about the R-help
mailing list