[R] Help with prob in sample()
    Greg Snow 
    538280 at gmail.com
       
    Fri Mar  7 20:20:58 CET 2014
    
    
  
Oops, my answer was for when replace=TRUE, when replace=FALSE it uses
a different method, but that method is described on the help page for
sample.  Essentially it chooses the first number, then removes that
value from x and prob, then chooses the next (rescaling prob again),
etc.
On Fri, Mar 7, 2014 at 10:52 AM, Greg Snow <538280 at gmail.com> wrote:
> Essentially what the sample function is doing (though it does it in a
> much more efficient way I expect) is the equivalent of this code:
>
> i <- c(1:10)
> myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9)
>
> myProbs <- myProbs/sum(myProbs)
> cp <- c(0,cumsum(myProbs))
>
> i[findInterval( runif(5), cp )]
>
>
> Internally the prob vector is scaled to sum to 1 (so there is no
> difference in your last 2 examples), then a cumulative sum is created,
> then random uniforms are generated and compared to the cumulative sum
> of the prob's.  This gives the desired probabilities for each value.
>
>
> On Fri, Mar 7, 2014 at 3:24 AM, Thomas <thomas.chesney at nottingham.ac.uk> wrote:
>> I'm trying to figure out exactly what the prob parameter in the sample
>> function does.
>>
>> With the following code, does sample look randomly for the first possible
>> sample--let's say it choses the second element--and then assess whether it
>> can be chosen according to it's probability which is 0.8? It seems unlikely
>> it would work like this.
>>
>> Or does it create a `biased die' which in this case would have ten sides
>> that each come up according to the probabilities in myProb, and roll it to
>> see which is the first element chosen, then remove that element, create a
>> new biased die with 9 sides and roll it again?
>>
>> i <- c(1:10)
>> myProbs <- c(0.2, 0.8, 0.3, 0.2, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4)
>> f <- sample(i,5, replace=FALSE, prob=myProbs)
>>
>> Then what's the difference in terms of sampling between the following two
>> examples, the second of which has been created so that the probabilities add
>> to 1?
>>
>> i <- c(1:10)
>> myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9)
>> f <- sample(i,5, replace=FALSE, prob=myProbs)
>>
>> i <- c(1:10)
>> myProbs <- c(0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.9/5, 0.9/5, 0.9/5, 0.9/5,
>> 0.9/5)
>> f <- sample(i,5, replace=FALSE, prob=myProbs)
>>
>> Thank you,
>>
>> Thomas Chesney
>> This message and any attachment are intended solely for the addressee and
>> may contain confidential information. If you have received this message in
>> error, please send it back to me, and immediately delete it.   Please do not
>> use, copy or disclose the information contained in this message or in any
>> attachment.  Any views or opinions expressed by the author of this email do
>> not necessarily reflect the views of the University of Nottingham.
>>
>> This message has been checked for viruses but the contents of an attachment
>> may still contain software viruses which could damage your computer system,
>> you are advised to perform your own checks. Email communications with the
>> University of Nottingham may be monitored as permitted by UK legislation.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com
-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com
    
    
More information about the R-help
mailing list