[Rd] bias issue in sample() (PR 17494)
Tierney, Luke
|uke-t|erney @end|ng |rom u|ow@@edu
Tue Feb 19 20:52:30 CET 2019
Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.
Here are two examples of bad behavior through current R-devel:
set.seed(123)
m <- (2/5) * 2^32
x <- sample(m, 1000000, replace = TRUE)
table(x %% 2, x > m / 2)
##
## FALSE TRUE
## 0 300620 198792
## 1 200196 300392
table(sample(2/7 * 2^32, 1000000, replace = TRUE) %% 2)
##
## 0 1
## 429054 570946
I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.
Some things still needed:
- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???
Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.
There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c. Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.
Best,
luke
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney using uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list