[R] Sampling

Wed Feb 6 20:43:41 CET 2008

Tim Hesterberg wrote:
>> values <- sapply(1:1000, function(i) sample(1:3, size=2, prob = c(.5, .25, .25)))
>> table(values)
>>     
> values
>   1   2   3 
> 834 574 592 
>
> The selection probabilities are not proportional to the specified
> probabilities.  
>
> In contrast, in S-PLUS:
>   
>> values <- sapply(1:1000, function(i) sample(1:3, size=2, prob = c(.5, .25, .25)))
>> table(values)
>>     
>     1   2   3 
>  1000 501 499
>
>   
But is that the right thing? If you can use prob=c(.6, .2, .2) and get 
1200 - 400 - 400, then I'm not going to play poker with you....

The interpretation in R is that you are dealing with "fat cards", i.e. 
card 1 is twice as thick as the other two, so there is 50% chance of 
getting the _first_ card as a 1 and additionally, (.25+.25)*2/3 to get 
the 2nd card as a 1 for a total of .8333. And since the two cards are 
different the expected number of occurrences of card 1 in 1000 samples 
is 833.  What is the interpretation in S-PLUS?

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907