[R] Help with prob in sample()

Greg Snow 538280 at gmail.com
Fri Mar 7 18:52:56 CET 2014


Essentially what the sample function is doing (though it does it in a
much more efficient way I expect) is the equivalent of this code:

i <- c(1:10)
myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9)

myProbs <- myProbs/sum(myProbs)
cp <- c(0,cumsum(myProbs))

i[findInterval( runif(5), cp )]


Internally the prob vector is scaled to sum to 1 (so there is no
difference in your last 2 examples), then a cumulative sum is created,
then random uniforms are generated and compared to the cumulative sum
of the prob's.  This gives the desired probabilities for each value.


On Fri, Mar 7, 2014 at 3:24 AM, Thomas <thomas.chesney at nottingham.ac.uk> wrote:
> I'm trying to figure out exactly what the prob parameter in the sample
> function does.
>
> With the following code, does sample look randomly for the first possible
> sample--let's say it choses the second element--and then assess whether it
> can be chosen according to it's probability which is 0.8? It seems unlikely
> it would work like this.
>
> Or does it create a `biased die' which in this case would have ten sides
> that each come up according to the probabilities in myProb, and roll it to
> see which is the first element chosen, then remove that element, create a
> new biased die with 9 sides and roll it again?
>
> i <- c(1:10)
> myProbs <- c(0.2, 0.8, 0.3, 0.2, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4)
> f <- sample(i,5, replace=FALSE, prob=myProbs)
>
> Then what's the difference in terms of sampling between the following two
> examples, the second of which has been created so that the probabilities add
> to 1?
>
> i <- c(1:10)
> myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9)
> f <- sample(i,5, replace=FALSE, prob=myProbs)
>
> i <- c(1:10)
> myProbs <- c(0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.9/5, 0.9/5, 0.9/5, 0.9/5,
> 0.9/5)
> f <- sample(i,5, replace=FALSE, prob=myProbs)
>
> Thank you,
>
> Thomas Chesney
> This message and any attachment are intended solely for the addressee and
> may contain confidential information. If you have received this message in
> error, please send it back to me, and immediately delete it.   Please do not
> use, copy or disclose the information contained in this message or in any
> attachment.  Any views or opinions expressed by the author of this email do
> not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com




More information about the R-help mailing list