[R] Sample size for logit model

Walter R. Paczkowski dataanalytics at earthlink.net
Mon Aug 6 03:14:25 CEST 2007

   I'm  hoping  someone  has  some  insight  about  sample size and logit
   estimation  that  could  help  me.   I  inherited a logit model from a
   client  in  the  direct  marketing area.  The previous consultant used
   approximately  143,800 observations in the training data set, of which
   only  50  (0.03%)  were  the  target  (  =  1) value for the dependent
   variable.   The  literature I could find gives very little guidance on
   sample sizes (Hosmer & Lemeshow have some material, but they basically
   say  that  little has been done).  Does anyone know of some literature
   or  even  rules-of-thumb  about sample sizes and/or ratio of target to
   non-target  values  of  the  dependent  variable?   The use of 143,800
   observations  is excessive.  Does this do anything to the significance
   of  the estimates (e.g., am I always guaranteed very small p-values?)?
   Is  oversampling  of  the  target  value  the  key and if so, how do I
   calculate weights for the estimations?
   Any guidance or suggestions in this area are definitely welcome.
   Walt Paczkowski

   Walter R. Paczkowski, Ph.D.
   Data Analytics Corp.
   44 Hamilton Lane
   Plainsboro, NJ  08536
   (V) 609-936-8999
   (F) 609-936-3733

More information about the R-help mailing list