[R] Sample size for logit model
Walter R. Paczkowski
dataanalytics at earthlink.net
Mon Aug 6 03:14:25 CEST 2007
Hi,
I'm hoping someone has some insight about sample size and logit
estimation that could help me. I inherited a logit model from a
client in the direct marketing area. The previous consultant used
approximately 143,800 observations in the training data set, of which
only 50 (0.03%) were the target ( = 1) value for the dependent
variable. The literature I could find gives very little guidance on
sample sizes (Hosmer & Lemeshow have some material, but they basically
say that little has been done). Does anyone know of some literature
or even rules-of-thumb about sample sizes and/or ratio of target to
non-target values of the dependent variable? The use of 143,800
observations is excessive. Does this do anything to the significance
of the estimates (e.g., am I always guaranteed very small p-values?)?
Is oversampling of the target value the key and if so, how do I
calculate weights for the estimations?
Any guidance or suggestions in this area are definitely welcome.
Walt Paczkowski
_________________________________
Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536
(V) 609-936-8999
(F) 609-936-3733
More information about the R-help
mailing list