[Rd] Using sample() to sample one value from a single value?
Henrik Bengtsson
hb at biostat.ucsf.edu
Wed Nov 3 18:54:18 CET 2010
Hi, consider this one as an FYI, or a seed for further discussion.
I am aware that many traps on sample() have been reported over the
years. I know that these are also documents in help("sample"). Still
I got bitten by this while writing
sample(units, size=length(units));
where 'units' is an index (positive integer) vector. It works in all
cases as expected (=I expect) expect for length(units) == 1. I know,
it is well known. However, it got to make me wonder if it is possible
to use sample() to draw a single value from a set containing only one
value. I don't think so, unless you draw from a value that is <= 1.
For instance, you can sample from c(10,10) by doing:
> sample(rep(10, times=2), size=2);
[1] 10 10
but you cannot sample from c(10) by doing:
> sample(rep(10, times=1), size=1);
[1] 9
unless you sample from a value <= 1, e.g.
sample(rep(0.31, times=1), size=1);
[1] 0.31
sample(rep(-10, times=1), size=1);
[1] -10
Note also the related issue of sampling from a double vector of length 1, e.g.
> sample(rep(1.2, times=2), size=2);
[1] 1.2 1.2
> sample(rep(1.2, times=1), size=1);
[1] 1
I the latter case 1.2 is coerced to an integer.
All of the above makes sense when one study the code of sample(), but
sample() is indeed dangerous, e.g. imagine how many bootstrap
estimates out there quietly gets incorrect.
In order to cover all cases of length(units), including one, a solution is:
sampleFrom <- function(x, size=length(x), ...) {
n <- length(x);
if (n == 1L) {
res <- x;
} else {
res <- sample(x, size=size, ...);
}
res;
} # sampleFrom()
> sampleFrom(rep(10, times=2), size=2);
[1] 10 10
> sampleFrom(rep(10, times=1), size=1);
[1] 10
> sampleFrom(rep(0.31, times=1), size=1);
[1] 0.31
> sampleFrom(rep(-10, times=1), size=1);
[1] -10
> sampleFrom(rep(1.2, times=2), size=2);
[1] 1.2 1.2
> sampleFrom(rep(1.2, times=1), size=1);
[1] 1.2
I want to add sampleFrom() to the wishlist of functions to be
available in default R. Alternatively, one can add an argument
'sampleFrom=FALSE' to the existing sample() function. Eventually such
an argument can be made TRUE by default.
/Henrik
More information about the R-devel
mailing list