[Rd] Add-on argument in sample()
Jon Skoien
jon.skoien at jrc.ec.europa.eu
Wed Jun 17 10:27:10 CEST 2015
On 6/16/2015 1:32 PM, Peter Meissner wrote:
> Am .06.2015, 14:55 Uhr, schrieb Millot Gael <Gael.Millot at curie.fr>:
>
>> Hi.
>>
>> I have a problem with the default behavior of sample(), which performs
>> sample(1:x) when x is a single value.
>> This behavior is well explained in ?sample.
>> However, this behavior is annoying when the number of value is not
>> predictable. Would it be possible to add an argument
>> that desactivates this and perform the sampling on a single value ?
>> Examples:
>>> sample(10, size = 1, replace = FALSE)
>> 10
>>
>>> sample(10, size = 3, replace = TRUE)
>> 10 10 10
>>
>>> sample(10, size = 3, replace = FALSE)
>> Error
>
> I think the problem here is that the function actually does what you
> would expect it to do given a statistic perspective. A sample of size
> three from a population of one without allowing to draw elements again
> that were drawn already is simply not defined. What shall the function
> give back?
If I understand right, this error is exactly what the poster would like
to see, but which you dont get currently. If length(population) == 1,
you will now sample from 1:population, not the population itself. So:
> sample(8:10, 3, replace = FALSE)
[1] 10 8 9
> sample(9:10, 3, replace = FALSE)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
> sample(10:10, 3, replace = FALSE)
[1] 8 10 2
I have to admit that I also find this behaviour inconsistent, even if it
is well described already on the first line of the details in the
documentation. It is definitely a feature which can cause some trouble,
and where the tests might end up more complicated than you would first
think.
>
> ... You can always wrap your code in a try() like this to prevent errors
> to break loops or functions:
>
> try(sample(...))
No error is given when length(population) == 1, and the result might be
perfectly valid if population is variable. So this will easily stay in
the script as an undetected bug.
>
> ... or you might check your arguments before execution:
>
>
> if ( !replace & length(population) >= size ){
> sample(population, size = size , replace = replace)
> }else{
> ...
> }
This test is not sufficient if length(population) == size == 1, so you
will also need to check for this special case:
if (length(population) == 1 & size == 1) {
population
} else if (!replace & length(population) >= size) {
sample(population, size = size, replace = replace)
} else {
...
}
Then the question would be if this test could be replaced with a new
argument to sample, e.g. expandSingle, which has TRUE as default for
backward compatibility, but FALSE if you dont want population to be
expanded to 1:population. It could certainly be useful in some cases,
but you still need to know about the expansion to use it. I think most
of these bugs occur because users did not think about the expansion in
the first place or did not realize that their population could be of
length 1 in some situations. These users would therefore not think about
changing the argument either.
Cheers,
Jon
>
>
>>
>> Many thanks for your help.
>>
>> Best wishes,
>>
>> Gael Millot.
>>
>>
>> Gael Millot
>> UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie
>> Equipe Recombinaison et instabilite genetique
>> Pav Trouillet Rossignol 5eme etage
>> Institut Curie
>> 26 rue d'Ulm
>> 75248 Paris Cedex 05
>> FRANCE
>> tel : 33 1 56 24 66 34
>> fax : 33 1 56 24 66 44
>> Email : gael.millot at curie.fr
>> http://perso.curie.fr/Gael.Millot/index.html
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> Best, Peter
>
> --
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Jon Olav Skøien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Climate Risk Management Unit
Via Fermi 2749, TP 100-01, I-21027 Ispra (VA), ITALY
jon.skoien at jrc.ec.europa.eu
Tel: +39 0332 789205
Disclaimer: Views expressed in this email are those of the individual
and do not necessarily represent official views of the European Commission.
More information about the R-devel
mailing list