[R] how many records for suitable regression

Greg Snow Greg.Snow at imail.org
Wed Mar 2 22:02:58 CET 2011


It really depends on what question you are trying to answer.  Things like the relative importance of type I and type II errors could matter a lot.  Correlation among the predictors can affect things.  What effect size are you looking for and what power do you want?  And much more.

There is a general rule of thumb that you need at least 10-20 observations per predictor variable (categorical variables need to be thought of as their indicator variables for this rule) to have any chance that the coefficients will be meaningful, but this is very much a lower bound and you may need more depending on some of the above questions.

If you have some idea of what the structure of your data will be, then you can simulate various sample sizes, analyze them, and see which sizes start to give meaningful answers.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of agent dunham
> Sent: Wednesday, March 02, 2011 6:50 AM
> To: r-help at r-project.org
> Subject: [R] how many records for suitable regression
> 
> Dear community,
> 
> I was wondering if it's possible to know if you have enough data for a
> regression study.
> 
> I remember you must have more data than parameters to obtain, but I'd
> like
> to know if there was something more sophisticated.
> 
> Thanks, user at host.com
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/how-many-
> records-for-suitable-regression-tp3331522p3331522.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list