Walter R. Paczkowski dataanalytics at earthlink.net
Mon Aug 6 03:14:25 CEST 2007

   I'm  hoping  someone  has  some  insight  about  sample size and logit
   estimation  that  could  help  me.   I  inherited a logit model from a
   client  in  the  direct  marketing area.  The previous consultant used
   approximately  143,800 observations in the training data set, of which
   only  50  (0.03%)  were  the  target  (  =  1) value for the dependent
   variable.   The  literature I could find gives very little guidance on
   sample sizes (Hosmer & Lemeshow have some material, but they basically
   say  that  little has been done).  Does anyone know of some literature
   or  even  rules-of-thumb  about sample sizes and/or ratio of target to
   non-target  values  of  the  dependent  variable?   The use of 143,800
   observations  is excessive.  Does this do anything to the significance
   of  the estimates (e.g., am I always guaranteed very small p-values?)?
   Is  oversampling  of  the  target  value  the  key and if so, how do I
   calculate weights for the estimations?
   Any guidance or suggestions in this area are definitely welcome.
