[R] Random # generator accuracy

Fri Jul 24 12:55:21 CEST 2009

On 23/07/2009 2:48 PM, Jim Bouldin wrote:
> Thanks Greg, that most definitely was it.  So apparently the default is
> sampling without replacement.  Fine, but this brings up a question I've had
> for a bit now, which is, how do you know what the default settings are for
> the arguments of any given function?  The HTML help files don't seem to
> indicate in many (most) cases.  Thanks. 

I think you are looking in the wrong place.  Most often (as for sample!) 
they just list the header of the function:

sample(x, size, replace = FALSE, prob = NULL)

and the default is explicit:  "replace = FALSE".  Sometimes this is 
repeated in the text, and sometimes it is only in the text, but there 
are very few cases where a default is defined but not documented, and I 
think those qualify as documentation errors that should be fixed.

Duncan Murdoch

> 
>> Try adding replace=TRUE to your call to sample, then you will get numbers
>> closer to what you are expecting.
>>
>> -- 
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.snow at imail.org
>> 801.408.8111
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of Jim Bouldin
>>> Sent: Thursday, July 23, 2009 12:00 PM
>>> To: r-help at r-project.org
>>> Subject: [R] Random # generator accuracy
>>>
>>>
>>> Dan Nordlund wrote:
>>>
>>> "It would be necessary to see the code for your 'brief test' before
>>> anyone
>>> could meaningfully comment on your results.  But your results for a
>>> single
>>> test could have been a valid "random" result."
>>>
>>> I've re-created what I did below.  The problem appears to be with the
>>> weighting process: the unweighted sample came out much closer to the
>>> actual
>>> than the weighted sample (>1% error) did.  Comments?
>>> Jim
>>>
>>>> x
>>>  [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>> weights
>>>  [1] 1 1 1 1 1 1 2 2 2 2 2 2
>>>
>>>> a = mean(replicate(1000000,(sample(x, 3, prob = weights))));a  # (1
>>> million samples from x, of size 3, weighted by "weights"; the mean
>>> should
>>> be 7.50)
>>> [1] 7.406977
>>>> 7.406977/7.5
>>> [1] 0.987597
>>>
>>>> b = mean(replicate(1000000,(sample(x, 3))));b  # (1 million samples
>>> from
>>> x, of size 3, not weighted this time; the mean should be 6.50)
>>> [1] 6.501477
>>>> 6.501477/6.5
>>> [1] 1.000227
>>>
>>>
>>> Jim Bouldin, PhD
>>> Research Ecologist
>>> Department of Plant Sciences, UC Davis
>>> Davis CA, 95616
>>> 530-554-1740
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> Jim Bouldin, PhD
> Research Ecologist
> Department of Plant Sciences, UC Davis
> Davis CA, 95616
> 530-554-1740
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.