[R] Bias in sample - Logistic Regression
Pedro.Rodriguez at sungard.com
Pedro.Rodriguez at sungard.com
Thu Oct 2 23:10:29 CEST 2008
Hi Shiva,
Maybe you are interested in the following paper:
Learning when Training Data are Costly: The Effect of Class Distribution
on Tree Induction. G. Weiss and F. Provost. Journal of Artificial
Intelligence Research 19 (2003) 315-354.
For validating the models in those enviroments,
William Elazmeh, Nathalie Japkowicz, Stan Matwin. (2006). A Framework
for Comparative Evaluation of Classifiers in the Presence of Class
Imbalance. Proceedings of the third Workshop on ROC Analysis in Machine
Learning, Pittsburgh, USA.
Regards,
Pedro
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Wensui Liu
Sent: Wednesday, October 01, 2008 7:20 PM
To: maithili_shiva at yahoo.com
Cc: r-help at r-project.org
Subject: Re: [R] Bias in sample - Logistic Regression
Hi, Shiva,
The idea of reject inference is very simple. Let's assume a credit card
environment. There are 100 applicants, out of which 50 will be approved
and
booked in. Therefore, we can only observe the adverse behavior, such as
default and delinquency, of 50 booked accounts. Again, let's assume out
of
50 booked cards, 5 are bad(default / delinquency). A normal thought is
to
build a model to "cherry pick" bad guys and then apply the same model to
all
applicants.
However, we can only observed the behavior of the applicants booked,
which
is 50, but not all applicants, which is 100. Therefore, the model result
looks better than what it is supposed to be. This is so-called 'sample
bias'. The same thing can happen to healthcare or direct marketing as
well.
Luckily enough, many people have done some excellent work on this
problem.
Please do some readings by Heckman. Greene in NYU has paper in this area
as
well. And I believe there is also implementation in R. If you use
SAS(large
in industry), take a look at proc qlim.
HTH.
--
===============================
WenSui Liu
Acquisition Risk, Chase
Email : wensui.x.liu at chase.com
Blog : statcompute.spaces.live.com
===============================
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list