[R] sample(c(0, 1)...) vs. rbinom

Albyn Jones jones at reed.edu
Thu May 23 19:49:20 CEST 2013


the "something similar" is return a different result in two
situations where one might expect the same result, ie when
a probability vector with equal probabilities is supplied versus
the default of equal probabilities.

And, assuming that by "concerns me" you mean "worries me",
I have no clue why you think it does!  It is a curiosity.

albyn

On Thu, May 23, 2013 at 04:38:18PM +0000, Nordlund, Dan (DSHS/RDA) wrote:
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Albyn Jones
> > Sent: Thursday, May 23, 2013 8:30 AM
> > To: r-help at r-project.org
> > Subject: Re: [R] sample(c(0, 1)...) vs. rbinom
> > 
> > After a bit of playing around, I discovered that
> > sample() does something similar in other situations:
> > 
> > > set.seed(105021)
> > > sample(1:5,1,prob=c(1,1,1,1,1))
> > [1] 3
> > > set.seed(105021)
> > > sample(1:5,1)
> > [1] 2
> > 
> > 
> > > set.seed(105021)
> > > sample(1:5,5,prob=c(1,1,1,1,1))
> > [1] 3 4 2 1 5
> > > set.seed(105021)
> > > sample(1:5,5)
> > [1] 2 5 1 4 3
> > 
> > albyn
> 
> 
> What is the "something similar" you are referring to?  And I guess I still don't understand what it is that concerns you about the sample function.
> 
> 
> Dan
> 
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA 98504-5204
> 
> 
> 
> > 
> > 
> > On 2013-05-22 22:24, peter dalgaard wrote:
> > > On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
> > >
> > >> You seem to be building an elaborate structure for testing the
> > >> reproducibility of the random number generator. I suspect that
> > rbinom
> > >> is calling the random number generator a different number of times
> > >> when you pass prob=0.5 than otherwise.
> > >
> > > Nope. It's switching 0 and 1:
> > >
> > >> set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp));
> > >> set.seed(1); rbinom(10,1,pp)
> > >  [1] 1 1 0 0 1 0 0 0 0 1
> > >  [1] 0 0 1 1 0 1 1 1 1 0
> > >
> > > which is curious, but of course has no implication for the
> > > distributional properties. Curiouser, if you drop the prob= in
> > > sample.
> > >
> > >> set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1);
> > >> rbinom(10,1,pp)
> > >  [1] 0 0 1 1 0 1 1 1 1 0
> > >  [1] 0 0 1 1 0 1 1 1 1 0
> > >
> > > However, it was never a design goal that two different random
> > > functions (or even two code paths within the same function) should
> > > give exactly the same values, even if they simulate the same
> > > distribution, so this is nothing more than a curiosity.
> > >
> > >
> > >>>
> > >>> Appendix A: some R code that exhibits the problem
> > >>> =================================================
> > >>>
> > >>> ppp <- seq(0, 1, by = 0.01)
> > >>>
> > >>> result <- do.call(rbind, lapply(ppp, function(p) {
> > >>>     set.seed(1)
> > >>>     sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
> > >>>                         prob=c(1-p, p))
> > >>>
> > >>>     set.seed(1)
> > >>>     rbinomRes <- rbinom(1, size = 1, prob = p)
> > >>>
> > >>>     data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
> > >>>
> > >>> }))
> > >>>
> > >>> result
> > >>>
> > >>>
> > >>> Appendix B: the output from the R code
> > >>> ======================================
> > >>>
> > >>>     prob equivalent
> > >>> 1   0.00       TRUE
> > >>> 2   0.01       TRUE
> > >>> 3   0.02       TRUE
> > >>> 4   0.03       TRUE
> > >>> 5   0.04       TRUE
> > >>> 6   0.05       TRUE
> > >>> 7   0.06       TRUE
> > >>> 8   0.07       TRUE
> > >>> 9   0.08       TRUE
> > >>> 10  0.09       TRUE
> > >>> 11  0.10       TRUE
> > >>> 12  0.11       TRUE
> > >>> 13  0.12       TRUE
> > >>> 14  0.13       TRUE
> > >>> 15  0.14       TRUE
> > >>> 16  0.15       TRUE
> > >>> 17  0.16       TRUE
> > >>> 18  0.17       TRUE
> > >>> 19  0.18       TRUE
> > >>> 20  0.19       TRUE
> > >>> 21  0.20       TRUE
> > >>> 22  0.21       TRUE
> > >>> 23  0.22       TRUE
> > >>> 24  0.23       TRUE
> > >>> 25  0.24       TRUE
> > >>> 26  0.25       TRUE
> > >>> 27  0.26       TRUE
> > >>> 28  0.27       TRUE
> > >>> 29  0.28       TRUE
> > >>> 30  0.29       TRUE
> > >>> 31  0.30       TRUE
> > >>> 32  0.31       TRUE
> > >>> 33  0.32       TRUE
> > >>> 34  0.33       TRUE
> > >>> 35  0.34       TRUE
> > >>> 36  0.35       TRUE
> > >>> 37  0.36       TRUE
> > >>> 38  0.37       TRUE
> > >>> 39  0.38       TRUE
> > >>> 40  0.39       TRUE
> > >>> 41  0.40       TRUE
> > >>> 42  0.41       TRUE
> > >>> 43  0.42       TRUE
> > >>> 44  0.43       TRUE
> > >>> 45  0.44       TRUE
> > >>> 46  0.45       TRUE
> > >>> 47  0.46       TRUE
> > >>> 48  0.47       TRUE
> > >>> 49  0.48       TRUE
> > >>> 50  0.49       TRUE
> > >>> 51  0.50      FALSE
> > >>> 52  0.51       TRUE
> > >>> 53  0.52       TRUE
> > >>> 54  0.53       TRUE
> > >>> 55  0.54       TRUE
> > >>> 56  0.55       TRUE
> > >>> 57  0.56       TRUE
> > >>> 58  0.57       TRUE
> > >>> 59  0.58       TRUE
> > >>> 60  0.59       TRUE
> > >>> 61  0.60       TRUE
> > >>> 62  0.61       TRUE
> > >>> 63  0.62       TRUE
> > >>> 64  0.63       TRUE
> > >>> 65  0.64       TRUE
> > >>> 66  0.65       TRUE
> > >>> 67  0.66       TRUE
> > >>> 68  0.67       TRUE
> > >>> 69  0.68       TRUE
> > >>> 70  0.69       TRUE
> > >>> 71  0.70       TRUE
> > >>> 72  0.71       TRUE
> > >>> 73  0.72       TRUE
> > >>> 74  0.73       TRUE
> > >>> 75  0.74       TRUE
> > >>> 76  0.75       TRUE
> > >>> 77  0.76       TRUE
> > >>> 78  0.77       TRUE
> > >>> 79  0.78       TRUE
> > >>> 80  0.79       TRUE
> > >>> 81  0.80       TRUE
> > >>> 82  0.81       TRUE
> > >>> 83  0.82       TRUE
> > >>> 84  0.83       TRUE
> > >>> 85  0.84       TRUE
> > >>> 86  0.85       TRUE
> > >>> 87  0.86       TRUE
> > >>> 88  0.87       TRUE
> > >>> 89  0.88       TRUE
> > >>> 90  0.89       TRUE
> > >>> 91  0.90       TRUE
> > >>> 92  0.91       TRUE
> > >>> 93  0.92       TRUE
> > >>> 94  0.93       TRUE
> > >>> 95  0.94       TRUE
> > >>> 96  0.95       TRUE
> > >>> 97  0.96       TRUE
> > >>> 98  0.97       TRUE
> > >>> 99  0.98       TRUE
> > >>> 100 0.99       TRUE
> > >>> 101 1.00       TRUE
> > >>>
> > >>> Appendix C: Session information
> > >>> ===============================
> > >>>
> > >>>> sessionInfo()
> > >>> R version 3.0.0 (2013-04-03)
> > >>> Platform: x86_64-redhat-linux-gnu (64-bit)
> > >>>
> > >>> locale:
> > >>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > >>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> > >>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> > >>>  [7] LC_PAPER=C                 LC_NAME=C
> > >>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > >>>
> > >>> attached base packages:
> > >>> [1] stats     graphics  grDevices utils     datasets  methods
> > >>> base
> > >>>
> > >>>>
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Albyn Jones
Reed College
jones at reed.edu



More information about the R-help mailing list