[R] Sample size calculation for non-normal population with unknown mean and SD

Bert Gunter gunter.berton at gene.com
Mon Jul 26 22:39:09 CEST 2010


The obvious:

Take a small sample, say 25-50.  Get an estimate of your distribution
from that. Then use this to determine how many more (if any)
additional samples you need for desired precision. This latter can
probably easily be done via simulation/bootstrap if you don't want to
specify a parametric form.

My guess is that your distribution is right-skew but not Poisson --
probably more like a truncated Poisson. But of course I have no idea
what sorts of documents you've got, so how would I know?


Bert Gunter
Genentech Nonclinical Biostatistics


On Mon, Jul 26, 2010 at 1:28 PM, Majonu <mnunez at andrew.cmu.edu> wrote:
>
> Basically, we have a population of 4,392 documents and we want to find out
> the number of patents per document. We don’t want to go through all 4,392
> documents, but want a reliable sample size from which to draw inferences. I
> feel like this count data will not follow a normal distribution, but more
> like a Poisson (skewed right.) The problem is we don’t have much similar
> data to this data set, so mean and standard deviation are unknown. Is there
> any way to derive a sample size based off the confidence interval, margin of
> error, and population size for what I assume to be a non-normal population?
> Any help would be greatly appreciated.
> --
> View this message in context: http://r.789695.n4.nabble.com/Sample-size-calculation-for-non-normal-population-with-unknown-mean-and-SD-tp2302833p2302833.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list