[Rd] [patch] add is.set parameter to sample()

Andrew Clausen clausen at econ.upenn.edu
Thu Mar 25 14:39:33 CET 2010


Hi Martin,

I re-attached the patch with a filename that will hopefully get
through the filters this time.

I agree that the case that you want to specify an integer is already
well handled with sample.int().  I disagree that the resample() code
for the set case given in the example is trivial.  The user has to
load the code into their program, which is annoying for such basic
functionality.  Moreover, the example code doesn't work for sampling
with replacement, and is poorly documented.  Finally, it isn't obvious
to new users of R what to do with resample().  (They would probably
try using resample() without cutting & pasting it into their program.
And why is it called resample()?  It's a mysterious name, that
suggests some technical concept, like resampling digital audio from
one sampling rate to another.)

So, the upside of my patch is that sample() becomes more convenient,
and the documentation becomes simpler.  What's the downside?  It is
backwards compatible.

sample() is one of the most important functions in R... I teach it to
my undergraduate economics students in the first 20 minutes of their
first R lesson.  It is the first probability/statistics function they
learn.  It is important that it is easy and convenient to use.

My first R problem set that I assigned my students was to do a Monte
Carlo simulation of the Monty Hall problem.  sample()'s surprise
really bites here because Monty has either one or two choices of door
to open.  It's bad enough that there is a surprise, but even worse
that there is no workaround that my students can understand easily.

Cheers,
Andrew

On 25 March 2010 06:53, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>>>>>> "AndrewC" == Andrew Clausen <clausen at econ.upenn.edu>
>>>>>>     on Tue, 23 Mar 2010 08:04:12 -0400 writes:
>
>    AndrewC> Hi all,
>    AndrewC> I forgot to test my patch!  I fixed a few bugs.
>
> and this time, you even forgot to attach it (in a way to pass
> through the list filters).
>
> Note however, that all this seems unnecessary,
> as we have  sample.int()
> and a trivial definition of resample()
> at least in R-devel, which will be released as R 2.11.0 on
> April 22.
>
> Thank you anyway, for your efforts!
> Martin
>
> Martin Maechler, ETH Zurich
>
>    AndrewC> On 22 March 2010 22:53, Andrew Clausen <clausen at econ.upenn.edu> wrote:
>    >> Hi all,
>    >>
>    >> sample() has some well-documented undesirable behaviour.
>    >>
>    >> sample(1:6, 1)
>    >> sample(2:6, 1)
>    >> ...
>    >> sample(5:6, 1)
>    >>
>    >> do what you expect, but
>    >>
>    >> sample(6:6, 1)
>    >> sample(1:6, 1)
>    >>
>    >> do the same thing.
>    >>
>    >> This behaviour is documented:
>    >>
>    >>     If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
>    >>     'x >= 1', sampling _via_ 'sample' takes place from '1:x'.  _Note_
>    >>     that this convenience feature may lead to undesired behaviour when
>    >>     'x' is of varying length 'sample(x)'.  See the 'resample()'
>    >>     example below.
>    >>
>    >> My proposal is to add an extra parameter is.set to sample() to control
>    >> this behaviour.  If the parameter is unspecified, then we keep the old
>    >> behaviour for compatibility.  If it is TRUE, then we treat the first
>    >> parameter x as a set.  If it is FALSE, then we treat it as a set size.
>    >>  This means that
>    >>
>    >> sample(6:6, 1, is.set=TRUE)
>    >>
>    >> would return 6 with probability 1.
>    >>
>    >> I have attached a patch to implement this new option.
>    >>
>    >> Cheers,
>    >> Andrew
>    >>
>    AndrewC> ______________________________________________
>    AndrewC> R-devel at r-project.org mailing list
>    AndrewC> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.diff
Type: text/x-patch
Size: 3638 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20100325/fe619967/attachment.bin>


More information about the R-devel mailing list