[R] Error: cannot take a sample larger than the population

Sat Dec 30 20:24:19 CET 2006

Aldi Kraja wrote:
> Partial Summary and discussion:
> =====================
> Thank you to Chao Gai, Chuck Cleland, and Jim Lemon for their suggestion 
> to use replace=T in R.
> There is a problem though (see below)
> 
> In the Splus7, sample is defined as
> -------------
> sample(x, size = n, replace = F, prob = NULL, n = NULL, ...)  where 
> replace=F
> In Splus7
> 
> xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
> 
> and the 
> 
> table(xlrmN1)/400
>     0    1    2
>  0.02 0.93 0.05
> show that "sample" is working exactly as expected based on the prob vector.
> 
> When "sample" is used in Splus7 with replacement we see the following 
> result:
>  > xlrmN1 <- sample(c(0,1,2),400 ,replace=T,prob=c(0.02 ,0.93 ,0.05 ))
>  > table(xlrmN1)/400
>       0     1      2
>  0.0125 0.925 0.0625
> which I think is working again as expected.
> 
> In the R, sample is defined as
> ---------
> 
> sample(x, size, replace = FALSE, prob = NULL)
> 
> So the above statement with replace=F did not work (reported error)
> but with replace=T produced,
> 
>> table(xlrmN1)/400
> xlrmN1
>      0      1      2 
> 0.0200 0.9225 0.0575 
> 
> which is not exactly the sample with the probabilities provided (0.02,0.93,0.05)
> 
> Now let's return to the concept of replace=F and replace=T.
> When I ask "sample" to select a sample of 400 from a vector of 3 with NO replacement, I would think the following
> a). create a very large sample from 0, 1, and 2. b). From this large sample, based on the prob vector select without replacement.
> c). As result I expect the probability of selected sample to be exactly the same with the prob vector (As in Splus7)
> 
> When I ask "sample" to select a sample of 400 from a vector of 3 with replacement, I would think the following
> a). create a very large sample from 0, 1, and 2. b). From this large sample, based on the prob vector select with replacement, 
> which means some of the previous selected 0, 1, 2 can be selected again.
> c). As result I expect the probability of selected sample to be NOT exactly the same with the prob vector (As in Splus7 and R).
> 
> So there are two conclusions: "sample" in R is not working correct, OR I am missing some precision as a rounding error to produce
> 
> prob=c(0.02 ,0.93 ,0.05 ).
> Am I misunderstanding the "sample" function in R?

  Yes, I think you are misunderstanding sample() in R.  If you want
those exact proportions in your xlrmN1 but with the observations in a
random order, you could do this:

> xlrmN1 <- rep(c(0,1,2), c(.02*400,.93*400,.05*400))[sample(400)]

> prop.table(table(xlrmN1))
xlrmN1
   0    1    2
0.02 0.93 0.05

> Any suggestions are appreciated.
> TIA,
> 
> Aldi
> 
> Aldi Kraja wrote:
> 
>> Hi,
>> In Splus7 this statement
>> xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
>> worked fine, but in R the interpreter reports that the length of the 
>> vector to chose c(0,1,2) is shorter than the size of many times I want 
>> to be selected from the vector c(0,1,2).
>> Any good reason?
>> See below the error.
>>
>>> xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
>> Error in sample(length(x), size, replace, prob) :
>>        cannot take a sample larger than the population
>> when 'replace = FALSE'
>> Execution halted
>>
>> TIA,
>>
>> Aldi
>>
>> --
>>
>>  
>>
> 
> --
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894