[R] Validating a Cox model on an external set

Berton Gunter gunter.berton at gene.com
Tue Sep 28 19:55:50 CEST 2004


But note that there may be deeper, non-statistical, issues of what you mean
by "validation" here: how good must the predictions be on the validation
data? How similar or dissimilar should the validation data be to the
"training" data? To what end/population is the fitted model to be applied?
For example, AFAIK in most scientific research, a model is not considered
"validated" unless results can be substantively reproduced (??) in different
labs, sometimes with alternative methods.

Think of the 1916 (I think it was) measurements of star positions during a
total solar eclipse to "validate" Einstein's Theory of General Relativity.
My point is not to say that this kind of "validation" is appropriate for a
Cox model, but only that the issues are worth thinking about.


-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Frank 
> E Harrell Jr
> Sent: Tuesday, September 28, 2004 10:11 AM
> To: Min-Han Tan
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Validating a Cox model on an external set
> 
> Min-Han Tan wrote:
> > Good morning,
> > 
> > Sorry to trouble the list. 
> > 
> > I have a problem I hope to seek your advice on. 
> >  
> > Essentially, I am trying to 'validate' a multivariate Cox 
> proportional
> > hazards model built in a training set, by testing it on an external
> > test set. I have performed a survfit using the Cox model to predict
> > survival for the test set, and obtained individual predictions for
> > survival time, with standard error for each test sample. 
> Each of these
> > cases has an actual survival time, some censored.
> >  
> > How can we decide whether the Cox model has been validated or not?
> 
> This is what the Design package and its cph and validate.cph and 
> calibrate.cph functions are for.
> 
> >  
> > I was suggested survdiff in the survival package, but survdiff works
> > between curves; am not sure how I could use it (I have a predicted
> > curve for each curve, but no 'observed curve' - the only observation
> > is death or censoring at time x)
> > 
> > Thank you all so much! 
> >  
> > Min-Han Tan
> > Van Andel Institute
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> > 
> 
> 
> -- 
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   
> Vanderbilt University
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 


More information about the R-help mailing list