[R] Survival Analysis with an Historical Control
Paul Miller
pjmiller_57 at yahoo.com
Thu Jul 10 18:11:40 CEST 2014
Hi Dr. Therneau,
Thanks for your response. This is very helpful.
My historical control value is 16 weeks. I've been having some trouble though determining how this value was obtained. Are you able to indicate how people normally go about determining a value for the historical control? Or do you have an view on how it ought to be done?
It does seem like selecting an appropriate value is extremely important. Otherwise the results obtained from the analysis are likely to be nonsense.
Thanks,
Paul
--------------------------------------------
On Thu, 7/10/14, Therneau, Terry M., Ph.D. <therneau at mayo.edu> wrote:
Subject: Re: Survival Analysis with an Historical Control
To: r-help at r-project.org, "Andrews, Chris" <chrisaa at med.umich.edu>, pjmill
Received: Thursday, July 10, 2014, 8:52 AM
You are asking for a one sample
test. Using your own data:
connection <- textConnection("
GD2 1 8 12 GD2 3 -12
10 GD2 6 -52 7
GD2 7 28 10 GD2 8 44
6 GD2 10 14 8
GD2 12 3 8 GD2 14 -52
9 GD2 15 35 11
GD2 18 6 13 GD2 20 12
7 GD2 23 -7 13
GD2 24 -52 9 GD2 26 -52 12 GD2 28 36
13
GD2 31 -52 8 GD2 33 9 10
GD2 34 -11 16
GD2 36 -52 6 GD2 39 15 14 GD2
40 13 13
GD2 42 21 13 GD2 44 -24 16 GD2 46 -52 13
GD2 48 28 9 GD2 2 15
9 GD2 4 -44 10
GD2 5 -2 12 GD2
9 8 7 GD2 11 12 7
GD2 13 -52 7 GD2 16 21 7 GD2
17 19 11
GD2 19 6 16 GD2 21 10 16
GD2 22 -15 6
GD2 25 4 15 GD2 27 -9
9 GD2 29 27 10
GD2 30 1 17 GD2 32 12
8 GD2 35 20 8
GD2 37 -32 8 GD2 38 15 8 GD2
41 5 14
GD2 43 35 13 GD2 45 28 9 GD2
47 6 15
")
hsv <- data.frame(scan(connection, list(vac="", pat=0,
wks=0, x=0)))
hsv <- transform(hsv, status= (wks >0), wks =
abs(wks))
fit1 <- survreg(Surv(wks, status) ~ 1, data=hsv,
dist='exponential')
temp <- predict(fit1, type='quantile', p=.5, se=TRUE)
c(median= temp$fit[1], std= temp$se[1])
median std
24.32723 4.36930
--
The predict function gives the predicted median survival and
standard deviation for each
observation in the data set. Since this was a mean
only model all n of them are the same
and I printed only the first.
For prediction they make the assumption that the std error
for my future study will be the
same as the std from this one, you want the future 95% CI to
not include the value of 16,
so the future mean will need to be at least 16 + 1.96*
4.369.
A nonparmetric version of the argument would be
> fit2 <- survfit(Surv(wks, status) ~ 1, data=hsv)
> print(fit2)
records n.max n.start events
median 0.95LCL 0.95UCL
48 48
48 31
21 15 35
Then make the argument that in our future study, the 95% CI
will stretch 6 units to the
left of the median, just like it did here. This
argument is a bit more tenuous though.
The exponential CI width depends on the total number of
events and total follow-up time,
and we can guess that the new study will be similar.
The Kaplan-Meier CI also depends on
the spacing of the deaths, which is less likely to
replicate.
Notes:
1. Use summary(fit2)$table to extract the CI
values. In R the print functions don't
allow you to "grab" what was printed, summary normally
does.
2. For the exponential we could work out the formula
in closed form -- a good homework
exercise for grad students perhaps but not an exciting way
to spend my own afternoon. An
advantage of the above approach is that we can easily use a
more realistic model like the
weibull.
3. I've never liked extracting out the "Surv(t,s)"
part of a formula as a separate
statement on another line. If I ever need to read this
code again, or even just the
printout from the run, keeping it all together gives much
better documentation.
4. Future calculations for survival data, of any
form, are always tenuous since they
depend critically on the total number of events that will be
in the future study. We can
legislate the total enrollment and follow-up time for that
future study, but the number of
events is never better than a guess. Paraphrasing a
motto found on the door of a well
respected investigator I worked with 30 years ago (because I
don't remember it exaclty):
"The incidence of the condition under
consideration and its subsequent death rate will
both drop by 1/2 at the commencement of a study, and will
not return to their former
values until the study finishes or the PI retires."
Terry T.
---------------------------------------------------------------------------
On 07/10/2014 05:00 AM, r-help-request at r-project.org
wrote:
> Hello All,
>
> I'm trying to figure out how to perform a survival
analysis with an historical control. I've spent some time
looking online and in my boooks but haven't found much
showing how to do this. Was wondering if there is a R
package that can do it, or if there are resources somewhere
that show the actual steps one takes, or if some
knowledgeable person might be willing to share some code.
>
> Here is a statement that describes the sort of analyis
I'm being asked to do.
>
> A one-sample parametric test assuming an exponential
form of survival was used to test the hypothesis that the
treatment produces a median PFS no greater than the
historical control PFS of 16 weeks. A sample median
PFS greater than 20.57 weeks would fall beyond the critical
value associated with the null hypothesis, and would be
considered statistically significant at alpha = .05, 1
tailed.
>
> My understanding is that the cutoff of 20.57 weeks was
obtained using an online calculator that can be found at:
>
> http://www.swogstat.org/stat/public/one_survival.htm
>
> Thus far, I've been unable to determine what values
were plugged into the calculator to get the cutoff.
>
> There's another calculator for a nonparamertric test
that can be found at:
>
> http://www.swogstat.org/stat/public/one_nonparametric_survival.htm
>
> It would be nice to try doing this using both a
parameteric and a non-parametric model.
>
> So my first question would be whether the approach
outlined above is valid or if the analysis should be done
some other way. If the basic idea is correct, is it
relatively easy (for a Terry Therneau type genius) to
implement the whole thing using R? The calculator is a great
tool, but, if reasonable, it would be nice to be able to
look at some code to see how the numbers actually get
produced.
More information about the R-help
mailing list