[R] Sample function and prob argument

Wed Jun 5 19:20:55 CEST 2019

On 05/06/2019 4:34 a.m., le Gleut, Ronan wrote:
> Dear R-help mailing list,
> 
>   
> 
> First of all, many many thanks for your great work on the R project!
> 
>   
> 
> I have a very small issue regarding the sample function. Depending if we
> specify values for the prob argument, we don't get the same result for a
> random sampling with replacement and with equal probabilities. See the
> attached R code for a minimal example with the R version 3.6.0.
> 
>   
> 
> With a previous R version (3.5.x), the result was just a permutation
> between the possible realizations. They are now totally different with the
> latest R version.
> 
>   
> 
> I understand that if we specify or not the prob argument, two different
> internal functions are used: .Internal(sample()) or .Internal(sample2()).
> Indeed, the algorithm used to draw a sample may not be the same if by
> default we assume equal probabilities (without the prob argument) or if
> the user defines himself the probabilities (even if they are equal).
> 
>   
> 
> I found this post on stackoverflow which explains the reasons of this
> difference (answer by Matthew Lundberg):
> 
> https://stackoverflow.com/questions/23316729/r-sample-probabilities-defaul
> t-is-equal-weight-why-does-specifying-equal-weigh
> 
>   
> 
> I was wondering whether the solution proposed by PatrickT could solve this
> issue? He proposed to have something like if(all.equal(prob, prob,
> tolerance = .Machine$double.eps) prob = NULL inside the sample.int routine
> in order to replicate prob=NULL with prob=rep(1, length(x)).
> 

R has never promised that these will be the same, so I doubt if R will 
change the sample() function.  However, it's very easy for you to adopt 
something like PatrickT's solution for yourself.  Just use this function:

PatrickTsample <- function(x, size, replace = FALSE, prob = NULL) {
   if (!is.null(prob) && max(prob) == min(prob))
     prob <- NULL
   sample(x = x, size = size, replace = replace, prob = prob)
}

You might want a looser tolerance on the vector of probabilities 
depending on your context.

Duncan Murdoch