[R] How to validate model?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Oct 7 22:03:27 CEST 2008


Pedro.Rodriguez at sungard.com wrote:
> Hi Frank,
> 
> Thanks for your feedback! But I think we are talking about two different
> things.
> 
> 1) Validation: The generalization performance of the classifier. See,
> for example, "Studies on the Validation of Internal Rating Systems" by
> BIS. 

I didn't think the desire was for a classifier but instead was for a 
risk predictor.  If prediction is the goal, classification methods or 
accuracy indexes based on classifications do not work very well.

> 
> 2) Calibration: Correct calibration of a PD rating system means that the
> calibrated PD estimates are accurate and conform to the observed default
> rates. See, for instance, An Overview and Framework for
> PD Backtesting and Benchmarking, by Castermans et al. 

I'm unclear on what you mean here.  Correct calibration of a predictive 
system means that the UNcalibrated estimates are accurate (i.e., they 
don't need any calibration).  (What is PD?)

> 
> Frank, you are referring the #1 and I am referring to #2. 
> 
> Nonetheless, I would never create a rating system if my model doesn't
> discriminate better than a coin toss.

For sure
Frank

> 
> Regards,
> 
> Pedro 
> 
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Frank E Harrell Jr [mailto:f.harrell at vanderbilt.edu] 
> Sent: Tuesday, October 07, 2008 11:02 AM
> To: Rodriguez, Pedro
> Cc: maithili_shiva at yahoo.com; r-help at r-project.org
> Subject: Re: [R] How to validate model?
> 
> Pedro.Rodriguez at sungard.com wrote:
>> Usually one validates scorecards with the ROC curve, Pietra Index, KS
>> test, etc. You may be interested in the WP 14 from BIS (www.bis.org).
>>
>> Regards,
>>
>> Pedro
> 
> No, the validation should be done using an absolute reliability 
> (calibration) curve.  You need to verify that at all levels of predicted
> 
> risk there is agreement with the true probability of failure.  An ROC 
> curve does not do that, and I doubt the others do.  A 
> resampling-corrected loess calibration curve is a good approach as 
> implemented in the Design package's calibrate function.
> 
> Frank
> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org]
>> On Behalf Of Maithili Shiva
>> Sent: Tuesday, October 07, 2008 8:22 AM
>> To: r-help at r-project.org
>> Subject: [R] How to validate model?
>>
>> Hi!
>>
>> I am working on scorecard model and I have arrived at the regression
>> equation. I have used logistic regression using R.
>>
>> My question is how do I validate this model? I do have hold out sample
>> of 5000 customers.
>>
>> Please guide me. Problem is I had never used Logistic regression
> earlier
>> neither I am used to credit scoring models.
>>
>> Thanks in advance
>>
>> Maithili
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list