[Rd] Add-on argument in sample()

Hervé Pagès hpages at fredhutch.org
Thu Jun 18 00:25:25 CEST 2015


Hi,

Special behavior of sample(x, ...) when length(x) is 1 is of course
a bad feature. I think it pre-dates sample.int() which is what people
should use these days if they want the behavior of sample(x, ...) when
length(x) is 1. And because we now have sample.int(), this feature
could in theory be removed from sample(). Unfortunately this would
break a lot of existing code so a warning or some kind of notification
would need to be implemented.

Even if the cost is high, that still sounds better/cleaner to me than
adding an extra argument to sample() to control this (which is only
going to be used by people aware of the problem but people aware of
the problem already know how to workaround it).

Cheers,
H.

On 06/17/2015 01:27 AM, Jon Skoien wrote:
>
>
> On 6/16/2015 1:32 PM, Peter Meissner wrote:
>> Am .06.2015, 14:55 Uhr, schrieb Millot Gael <Gael.Millot at curie.fr>:
>>
>>> Hi.
>>>
>>> I have a problem with the default behavior of sample(), which performs
>>> sample(1:x) when x is a single value.
>>> This behavior is well explained in ?sample.
>>> However, this behavior is annoying when the number of value is not
>>> predictable. Would it be possible to add an argument
>>> that desactivates this and perform the sampling on a single value ?
>>> Examples:
>>>> sample(10, size = 1, replace = FALSE)
>>> 10
>>>
>>>> sample(10, size = 3, replace = TRUE)
>>> 10 10 10
>>>
>>>> sample(10, size = 3, replace = FALSE)
>>> Error
>>
>> I think the problem here is that the function actually does what you
>> would expect it to do given a statistic perspective. A sample of size
>> three from a population of one without allowing to draw elements again
>> that were drawn already is simply not defined. What shall the function
>> give back?
>
>
> If I understand right, this error is exactly what the poster would like
> to see, but which you dont get currently. If length(population) == 1,
> you will now sample from 1:population, not the population itself. So:
>
>  > sample(8:10, 3, replace = FALSE)
> [1] 10  8  9
>  > sample(9:10, 3, replace = FALSE)
> Error in sample.int(length(x), size, replace, prob) :
>    cannot take a sample larger than the population when 'replace = FALSE'
>  > sample(10:10, 3, replace = FALSE)
> [1]  8 10  2
>
> I have to admit that I also find this behaviour inconsistent, even if it
> is well described already on the first line of the details in the
> documentation. It is definitely a feature which can cause some trouble,
> and where the tests might end up more complicated than you would first
> think.
>
>
>>
>> ... You can always wrap your code in a try() like this to prevent errors
>> to break loops or functions:
>>
>> try(sample(...))
>
> No error is given when length(population) == 1, and the result might be
> perfectly valid if population is variable. So this will easily stay in
> the script as an undetected bug.
>
>>
>> ... or you might check your arguments before execution:
>>
>>
>> if ( !replace & length(population) >= size ){
>>    sample(population, size = size , replace = replace)
>> }else{
>>    ...
>> }
>
> This test is not sufficient if length(population) == size == 1, so you
> will also need to check for this special case:
>
> if (length(population) == 1 & size == 1) {
>    population
> } else if (!replace & length(population) >= size) {
>    sample(population, size = size, replace = replace)
> } else {
>    ...
> }
>
> Then the question would be if this test could be replaced with a new
> argument to sample, e.g. expandSingle, which has TRUE as default for
> backward compatibility, but FALSE if you dont want population to be
> expanded to 1:population. It could certainly be useful in some cases,
> but you still need to know about the expansion to use it. I think most
> of these bugs occur because users did not think about the expansion in
> the first place or did not realize that their population could be of
> length 1 in some situations. These users would therefore not think about
> changing the argument either.
>
> Cheers,
> Jon
>
>>
>>
>>>
>>> Many thanks for your help.
>>>
>>> Best wishes,
>>>
>>> Gael Millot.
>>>
>>>
>>> Gael Millot
>>> UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie
>>> Equipe Recombinaison et instabilite genetique
>>> Pav Trouillet Rossignol 5eme etage
>>> Institut Curie
>>> 26 rue d'Ulm
>>> 75248 Paris Cedex 05
>>> FRANCE
>>> tel : 33 1 56 24 66 34
>>> fax : 33 1 56 24 66 44
>>> Email : gael.millot at curie.fr
>>> http://perso.curie.fr/Gael.Millot/index.html
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>> Best, Peter
>>
>> --
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list