[R] Sample size for logit model
Walter R. Paczkowski
dataanalytics at earthlink.net
Mon Aug 6 03:14:25 CEST 2007
I'm hoping someone has some insight about sample size and logit
estimation that could help me. I inherited a logit model from a
client in the direct marketing area. The previous consultant used
approximately 143,800 observations in the training data set, of which
only 50 (0.03%) were the target ( = 1) value for the dependent
variable. The literature I could find gives very little guidance on
sample sizes (Hosmer & Lemeshow have some material, but they basically
say that little has been done). Does anyone know of some literature
or even rules-of-thumb about sample sizes and/or ratio of target to
non-target values of the dependent variable? The use of 143,800
observations is excessive. Does this do anything to the significance
of the estimates (e.g., am I always guaranteed very small p-values?)?
Is oversampling of the target value the key and if so, how do I
calculate weights for the estimations?
Any guidance or suggestions in this area are definitely welcome.
Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536
More information about the R-help