[Rd] Add-on argument in sample()

Jon Skoien jon.skoien at jrc.ec.europa.eu
Wed Jun 17 10:27:10 CEST 2015



On 6/16/2015 1:32 PM, Peter Meissner wrote:
> Am .06.2015, 14:55 Uhr, schrieb Millot Gael <Gael.Millot at curie.fr>:
>
>> Hi.
>>
>> I have a problem with the default behavior of sample(), which performs
>> sample(1:x) when x is a single value.
>> This behavior is well explained in ?sample.
>> However, this behavior is annoying when the number of value is not
>> predictable. Would it be possible to add an argument
>> that desactivates this and perform the sampling on a single value ?
>> Examples:
>>> sample(10, size = 1, replace = FALSE)
>> 10
>>
>>> sample(10, size = 3, replace = TRUE)
>> 10 10 10
>>
>>> sample(10, size = 3, replace = FALSE)
>> Error
>
> I think the problem here is that the function actually does what you
> would expect it to do given a statistic perspective. A sample of size
> three from a population of one without allowing to draw elements again
> that were drawn already is simply not defined. What shall the function
> give back?


If I understand right, this error is exactly what the poster would like 
to see, but which you dont get currently. If length(population) == 1, 
you will now sample from 1:population, not the population itself. So:

 > sample(8:10, 3, replace = FALSE)
[1] 10  8  9
 > sample(9:10, 3, replace = FALSE)
Error in sample.int(length(x), size, replace, prob) :
   cannot take a sample larger than the population when 'replace = FALSE'
 > sample(10:10, 3, replace = FALSE)
[1]  8 10  2

I have to admit that I also find this behaviour inconsistent, even if it 
is well described already on the first line of the details in the 
documentation. It is definitely a feature which can cause some trouble, 
and where the tests might end up more complicated than you would first 
think.


>
> ... You can always wrap your code in a try() like this to prevent errors
> to break loops or functions:
>
> try(sample(...))

No error is given when length(population) == 1, and the result might be 
perfectly valid if population is variable. So this will easily stay in 
the script as an undetected bug.

>
> ... or you might check your arguments before execution:
>
>
> if ( !replace & length(population) >= size ){
>    sample(population, size = size , replace = replace)
> }else{
>    ...
> }

This test is not sufficient if length(population) == size == 1, so you 
will also need to check for this special case:

if (length(population) == 1 & size == 1) {
   population
} else if (!replace & length(population) >= size) {
   sample(population, size = size, replace = replace)
} else {
   ...
}

Then the question would be if this test could be replaced with a new 
argument to sample, e.g. expandSingle, which has TRUE as default for 
backward compatibility, but FALSE if you dont want population to be 
expanded to 1:population. It could certainly be useful in some cases, 
but you still need to know about the expansion to use it. I think most 
of these bugs occur because users did not think about the expansion in 
the first place or did not realize that their population could be of 
length 1 in some situations. These users would therefore not think about 
changing the argument either.

Cheers,
Jon

>
>
>>
>> Many thanks for your help.
>>
>> Best wishes,
>>
>> Gael Millot.
>>
>>
>> Gael Millot
>> UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie
>> Equipe Recombinaison et instabilite genetique
>> Pav Trouillet Rossignol 5eme etage
>> Institut Curie
>> 26 rue d'Ulm
>> 75248 Paris Cedex 05
>> FRANCE
>> tel : 33 1 56 24 66 34
>> fax : 33 1 56 24 66 44
>> Email : gael.millot at curie.fr
>> http://perso.curie.fr/Gael.Millot/index.html
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> Best, Peter
>
> --
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Jon Olav Skøien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Climate Risk Management Unit

Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY

jon.skoien at jrc.ec.europa.eu
Tel:  +39 0332 789205

Disclaimer: Views expressed in this email are those of the individual 
and do not necessarily represent official views of the European Commission.



More information about the R-devel mailing list