[Rd] Add-on argument in sample()

Wed Jun 17 18:19:48 CEST 2015

I don't like the idea of having a length-1 dim attribute trigger some
behavior of sample.  (Should a length-2 dim cause it to sample
rows of a matrix, as unique() and duplicated() do?).

S+'s sample() had another argument, 'n', that could be used to
specify the size of the population to sample from.  It had to be a single
nonnegative integral number and only one of the 'x' and 'n'
arguments could be supplied.  This was not optimal, but the help
file discouraged the use of the 'x' argument and encouraged the use
of subscripting with sample()'s output instead of having sample()
do the subscripting.

S+'s rsample() (called by sample())  only had the 'n' argument,
you could not input the population to sample from.  It also separated
sampling from shuffling, which is handy when taking large samples
from huge populations - shuffling the output often took most of
the time.

The S+ argument lists are:
sample(x, size = n, replace = F, prob = NULL, n = NULL, ...)
rsample(n, size = n, replace = F, prob = NULL,
        bigdata = F, minimal = NULL, ..., order = T)

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jun 17, 2015 at 7:18 AM, Radford Neal <radford at cs.toronto.edu>
wrote:

> > Then the question would be if this test could be replaced with a new
> > argument to sample, e.g. expandSingle, which has TRUE as default for
> > backward compatibility, but FALSE if you dont want population to be
> > expanded to 1:population. It could certainly be useful in some cases,
> > but you still need to know about the expansion to use it. I think most
> > of these bugs occur because users did not think about the expansion in
> > the first place or did not realize that their population could be of
> > length 1 in some situations. These users would therefore not think about
> > changing the argument either.
>
> I think the solution might be to make sample always treat the first
> argument as the vector to sample from if it has a "dim" attribute that
> explicitly specifies that it is a one-dimensional array.  The effect
> of this would be that sample(10,1) would sample from 1:10, as at
> present, but sample(array(10),1) would sample from the length-one
> vector with element 10 (and hence always return 10).
>
> With this change, you can easily ensure that sample(v,1) always samples
> from v even when it has length one by rewriting it to sample(array(v),1).
>
> It's of course possible that some existing code relies on the current
> behaviour, but probably not much existing code, since one-dimensional
> arrays are (I think) not very common at present.
>
> A bigger gain would come if one also introduced a new sequence operator
> that creates a sequence that is marked as a one-dimensional array, which
> would be part of a solution to several other problems as well, as I
> propose at http://www.cs.utoronto.ca/~radford/ftp/R-lang-ext.pdf
>
>    Radford Neal
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]