[R] how to select an element from a vector based on a probability

Boris Steipe boris.steipe at utoronto.ca
Thu Apr 10 23:10:20 CEST 2014


But your original approach was more concise - just use unique() for the input vector (since neither length nor order matter) and the probabilities:

a <- c(1,1,1,1,2,5)
set.seed(11235813)
out <- sample(unique(a), 100, replace=TRUE, prob = unique(a))
out
    [1] 5 5 5 2 5 5 1 1 5 2 5 2 5 5 1 5 1 2 ...

table(out)/length(out)   # relative observations
out
   1    2    5 
0.13 0.25 0.62  

unique(a)/sum(unique(a)) # expected observations
[1] 0.125 0.250 0.625

Cheers,
B.


On 2014-04-10, at 4:34 PM, Rui Barradas wrote:

> Hello,
> 
> Inline.
> 
> Em 10-04-2014 21:04, Nordlund, Dan (DSHS/RDA) escreveu:
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of Simone Gabbriellini
>>> Sent: Thursday, April 10, 2014 11:59 AM
>>> To: Rui Barradas
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] how to select an element from a vector based on a
>>> probability
>>> 
>>> Hello, Rui,
>>> 
>>> it does, indeed!
>>> 
>>> thanks,
>>> Simone
>>> 
>>> 2014-04-10 20:55 GMT+02:00 Rui Barradas <ruipbarradas at sapo.pt>:
>>>> Hello,
>>>> 
>>>> Use ?sample.
>>>> 
>>>> sample(x, 1, prob = x)
>>>> 
>> 
>> Just be aware that, in using this method, the probability of selection of a particular value will also be a function of how frequent the value is.  For example,
>> 
>> set.seed(7632)
>> x <- c(2,2,6,2,1,1,1,3)
>> table(sample(x, 10000, prob=x, replace=TRUE))
>> 
>>    1    2    3    6
>> 1664 3340 1696 3300
>> 
>> 
>> The probability that a vector position with a value of 1 will be selected is 1/18 (in this particular example).  However, the probability that a value of 1 will be selected is 1/6 since there are three 1's.  The probability of selecting the position with a value of 3 is 3/18.  But since there is only one position with a value of 3, the probability of getting the value 1 on any given sample is equal to the probability of getting the value 3.
> 
> You're right, I didn't notice that. One way of avoiding that problem is the following.
> 
> prob <- merge(x, data.frame(x=unique(x), prob=unique(x)/sum(unique(x))))$prob
> sample(x, 1, prob = prob)
> 
> Rui Barradas
> 
>> 
>> 
>> 
>> 
>>>> Hope this helps,
>>>> 
>>>> Rui Barradas
>>>> 
>>>> Em 10-04-2014 19:49, Simone Gabbriellini escreveu:
>>>> 
>>>>> Hello List,
>>>>> 
>>>>> I have an array like:
>>>>> 
>>>>> c(4, 3, 5, 4, 2, 2, 2, 4, 2, 6, 6, 7, 5, 5, 5, 10, 10, 11, 10,
>>>>> 12, 10, 11, 9, 12, 10, 36, 35, 36, 36, 36, 35, 35, 36, 37, 35,
>>>>> 35, 38, 35, 38, 36, 37, 36, 36, 37, 36, 35, 35, 36, 36, 35, 35,
>>>>> 36, 35, 38, 35, 35, 35, 36, 35, 35, 35, 6, 5, 8, 6, 6, 7, 1,
>>>>> 7, 7, 8, 9, 7, 8, 7, 7, 13, 13, 13, 14, 13, 13, 13, 14, 14, 15,
>>>>> 15, 14, 13, 14, 39, 39, 39, 39, 39, 39, 41, 40, 39, 39, 39, 39,
>>>>> 40, 39, 39, 41, 41, 40, 39, 40, 41, 40, 41, 40, 40, 40, 39, 41,
>>>>> 39, 39, 39, 39, 40, 39, 39, 40, 40, 39, 39, 39, 1, 4, 3, 4)
>>>>> 
>>>>> I would like to pick up an element with a probability proportional
>>> to
>>>>> the element value, thus higher values should be picked up more often
>>>>> than small values (i.e., picking up 38 should be more probable than
>>>>> picking up 3)
>>>>> 
>>>>> Do you have any idea on how to code such a rich-get-richer
>>> mechanism?
>>>>> 
>>>>> Best regards,
>>>>> Simone
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> Dan
>> 
>> Daniel J. Nordlund, PhD
>> Research and Data Analysis Division
>> Services & Enterprise Support Administration
>> Washington State Department of Social and Health Services
>> 
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list