[R] Off-topic: (Simple?) Random Sampling when n is a random variable

Wed Jun 15 01:19:37 CEST 2011

Thanks Greg!

Andrew

On Tue, Jun 14, 2011 at 04:13:52PM -0600, Greg Snow wrote:
> This sounds like what is called "domains" in survey sampling (possibly other names, but that is what I learned it as).  The idea is that you take a random sample (or the population) then ask a question to determine which domain the subject is in, then ask the question of interest in the domain of interest.  For example you want to know how long tourists plan to stay in the area so you go to the airport and ask N people if they are tourists, if they answer 'yes' then you ask how long they will be staying.  The sample size of tourists n (which is <=N) is random and not know ahead.  
> 
> This is the same idea as you flipping a coin instead of asking the 1st question.  And yes, the randomness of n does change the formulas needed.  Consult a survey sampling text for details (I am looking at the one by Lohr which has a section on this).
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Andrew Robinson
> > Sent: Monday, June 13, 2011 7:03 PM
> > To: R-Help Discussion
> > Subject: [R] Off-topic: (Simple?) Random Sampling when n is a random
> > variable
> > 
> > Hi everyone,
> > 
> > I'm involved in a discussion with a colleague.  He suggested a sample
> > design for a finite-sized process that (to all intents and purposes)
> > involves tossing a coin and examining the unit if the coin shows
> > Heads.
> > 
> > I should emphasize that we're both approaching the problem from a
> > design-based sampling theory point of view.  So I have no argument
> > about the appropriateness of the design as such.
> > 
> > Can this design be called 'Simple Random Sampling'?  My intuition
> > suggests that it can not, because the sample size is a random
> > variable, so the usual standard error equations for SRS will be
> > inaccurate.  But I can't find any citations to back me up.  So maybe
> > I'm wrong.  My questions are:
> > 
> > 1) does this design have a name, and
> > 
> > 2) are the usual SRS formula for e.g. the standard error of the mean
> > exactly accurate?  Or are they defensibly accurate approximations?
> > 
> > 3) can anyone suggest some citations that provide guidance either way?
> > 
> > Thanks for any assistance!
> > 
> > Andrew
> > 
> > --
> > Andrew Robinson
> > Program Manager, ACERA
> > Department of Mathematics and Statistics            Tel: +61-3-8344-
> > 6410
> > University of Melbourne, VIC 3010 Australia               (prefer
> > email)
> > http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-
> > 4599
> > http://www.acera.unimelb.edu.au/
> > 
> > Forest Analytics with R (Springer, 2011)
> > http://www.ms.unimelb.edu.au/FAwR/
> > Introduction to Scientific Programming and Simulation using R (CRC,
> > 2009):
> > http://www.ms.unimelb.edu.au/spuRs/
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Program Manager, ACERA 
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/