[R] sample(c(0, 1)...) vs. rbinom
Albyn Jones
jones at reed.edu
Thu May 23 19:49:20 CEST 2013
the "something similar" is return a different result in two
situations where one might expect the same result, ie when
a probability vector with equal probabilities is supplied versus
the default of equal probabilities.
And, assuming that by "concerns me" you mean "worries me",
I have no clue why you think it does! It is a curiosity.
albyn
On Thu, May 23, 2013 at 04:38:18PM +0000, Nordlund, Dan (DSHS/RDA) wrote:
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Albyn Jones
> > Sent: Thursday, May 23, 2013 8:30 AM
> > To: r-help at r-project.org
> > Subject: Re: [R] sample(c(0, 1)...) vs. rbinom
> >
> > After a bit of playing around, I discovered that
> > sample() does something similar in other situations:
> >
> > > set.seed(105021)
> > > sample(1:5,1,prob=c(1,1,1,1,1))
> > [1] 3
> > > set.seed(105021)
> > > sample(1:5,1)
> > [1] 2
> >
> >
> > > set.seed(105021)
> > > sample(1:5,5,prob=c(1,1,1,1,1))
> > [1] 3 4 2 1 5
> > > set.seed(105021)
> > > sample(1:5,5)
> > [1] 2 5 1 4 3
> >
> > albyn
>
>
> What is the "something similar" you are referring to? And I guess I still don't understand what it is that concerns you about the sample function.
>
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA 98504-5204
>
>
>
> >
> >
> > On 2013-05-22 22:24, peter dalgaard wrote:
> > > On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
> > >
> > >> You seem to be building an elaborate structure for testing the
> > >> reproducibility of the random number generator. I suspect that
> > rbinom
> > >> is calling the random number generator a different number of times
> > >> when you pass prob=0.5 than otherwise.
> > >
> > > Nope. It's switching 0 and 1:
> > >
> > >> set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp));
> > >> set.seed(1); rbinom(10,1,pp)
> > > [1] 1 1 0 0 1 0 0 0 0 1
> > > [1] 0 0 1 1 0 1 1 1 1 0
> > >
> > > which is curious, but of course has no implication for the
> > > distributional properties. Curiouser, if you drop the prob= in
> > > sample.
> > >
> > >> set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1);
> > >> rbinom(10,1,pp)
> > > [1] 0 0 1 1 0 1 1 1 1 0
> > > [1] 0 0 1 1 0 1 1 1 1 0
> > >
> > > However, it was never a design goal that two different random
> > > functions (or even two code paths within the same function) should
> > > give exactly the same values, even if they simulate the same
> > > distribution, so this is nothing more than a curiosity.
> > >
> > >
> > >>>
> > >>> Appendix A: some R code that exhibits the problem
> > >>> =================================================
> > >>>
> > >>> ppp <- seq(0, 1, by = 0.01)
> > >>>
> > >>> result <- do.call(rbind, lapply(ppp, function(p) {
> > >>> set.seed(1)
> > >>> sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
> > >>> prob=c(1-p, p))
> > >>>
> > >>> set.seed(1)
> > >>> rbinomRes <- rbinom(1, size = 1, prob = p)
> > >>>
> > >>> data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
> > >>>
> > >>> }))
> > >>>
> > >>> result
> > >>>
> > >>>
> > >>> Appendix B: the output from the R code
> > >>> ======================================
> > >>>
> > >>> prob equivalent
> > >>> 1 0.00 TRUE
> > >>> 2 0.01 TRUE
> > >>> 3 0.02 TRUE
> > >>> 4 0.03 TRUE
> > >>> 5 0.04 TRUE
> > >>> 6 0.05 TRUE
> > >>> 7 0.06 TRUE
> > >>> 8 0.07 TRUE
> > >>> 9 0.08 TRUE
> > >>> 10 0.09 TRUE
> > >>> 11 0.10 TRUE
> > >>> 12 0.11 TRUE
> > >>> 13 0.12 TRUE
> > >>> 14 0.13 TRUE
> > >>> 15 0.14 TRUE
> > >>> 16 0.15 TRUE
> > >>> 17 0.16 TRUE
> > >>> 18 0.17 TRUE
> > >>> 19 0.18 TRUE
> > >>> 20 0.19 TRUE
> > >>> 21 0.20 TRUE
> > >>> 22 0.21 TRUE
> > >>> 23 0.22 TRUE
> > >>> 24 0.23 TRUE
> > >>> 25 0.24 TRUE
> > >>> 26 0.25 TRUE
> > >>> 27 0.26 TRUE
> > >>> 28 0.27 TRUE
> > >>> 29 0.28 TRUE
> > >>> 30 0.29 TRUE
> > >>> 31 0.30 TRUE
> > >>> 32 0.31 TRUE
> > >>> 33 0.32 TRUE
> > >>> 34 0.33 TRUE
> > >>> 35 0.34 TRUE
> > >>> 36 0.35 TRUE
> > >>> 37 0.36 TRUE
> > >>> 38 0.37 TRUE
> > >>> 39 0.38 TRUE
> > >>> 40 0.39 TRUE
> > >>> 41 0.40 TRUE
> > >>> 42 0.41 TRUE
> > >>> 43 0.42 TRUE
> > >>> 44 0.43 TRUE
> > >>> 45 0.44 TRUE
> > >>> 46 0.45 TRUE
> > >>> 47 0.46 TRUE
> > >>> 48 0.47 TRUE
> > >>> 49 0.48 TRUE
> > >>> 50 0.49 TRUE
> > >>> 51 0.50 FALSE
> > >>> 52 0.51 TRUE
> > >>> 53 0.52 TRUE
> > >>> 54 0.53 TRUE
> > >>> 55 0.54 TRUE
> > >>> 56 0.55 TRUE
> > >>> 57 0.56 TRUE
> > >>> 58 0.57 TRUE
> > >>> 59 0.58 TRUE
> > >>> 60 0.59 TRUE
> > >>> 61 0.60 TRUE
> > >>> 62 0.61 TRUE
> > >>> 63 0.62 TRUE
> > >>> 64 0.63 TRUE
> > >>> 65 0.64 TRUE
> > >>> 66 0.65 TRUE
> > >>> 67 0.66 TRUE
> > >>> 68 0.67 TRUE
> > >>> 69 0.68 TRUE
> > >>> 70 0.69 TRUE
> > >>> 71 0.70 TRUE
> > >>> 72 0.71 TRUE
> > >>> 73 0.72 TRUE
> > >>> 74 0.73 TRUE
> > >>> 75 0.74 TRUE
> > >>> 76 0.75 TRUE
> > >>> 77 0.76 TRUE
> > >>> 78 0.77 TRUE
> > >>> 79 0.78 TRUE
> > >>> 80 0.79 TRUE
> > >>> 81 0.80 TRUE
> > >>> 82 0.81 TRUE
> > >>> 83 0.82 TRUE
> > >>> 84 0.83 TRUE
> > >>> 85 0.84 TRUE
> > >>> 86 0.85 TRUE
> > >>> 87 0.86 TRUE
> > >>> 88 0.87 TRUE
> > >>> 89 0.88 TRUE
> > >>> 90 0.89 TRUE
> > >>> 91 0.90 TRUE
> > >>> 92 0.91 TRUE
> > >>> 93 0.92 TRUE
> > >>> 94 0.93 TRUE
> > >>> 95 0.94 TRUE
> > >>> 96 0.95 TRUE
> > >>> 97 0.96 TRUE
> > >>> 98 0.97 TRUE
> > >>> 99 0.98 TRUE
> > >>> 100 0.99 TRUE
> > >>> 101 1.00 TRUE
> > >>>
> > >>> Appendix C: Session information
> > >>> ===============================
> > >>>
> > >>>> sessionInfo()
> > >>> R version 3.0.0 (2013-04-03)
> > >>> Platform: x86_64-redhat-linux-gnu (64-bit)
> > >>>
> > >>> locale:
> > >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> > >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> > >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> > >>> [7] LC_PAPER=C LC_NAME=C
> > >>> [9] LC_ADDRESS=C LC_TELEPHONE=C
> > >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > >>>
> > >>> attached base packages:
> > >>> [1] stats graphics grDevices utils datasets methods
> > >>> base
> > >>>
> > >>>>
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Albyn Jones
Reed College
jones at reed.edu
More information about the R-help
mailing list