[R] Random # generator accuracy
(Ted Harding)
Ted.Harding at manchester.ac.uk
Thu Jul 23 22:15:55 CEST 2009
OOPS! The result of a calculation below somehow got omitted!
(325820+326140+325289+325098+325475+325916)/
(174873+175398+174196+174445+173240+174110)
# [1] 1.867351
to be compared (as at the end) with the ratio 1.867471 of the
expected number of "weight=2" to expected number of "weight=1".
Sorry.
Ted.
On 23-Jul-09 20:05:09, Ted Harding wrote:
> On 23-Jul-09 17:59:56, Jim Bouldin wrote:
>> Dan Nordlund wrote:
>> "It would be necessary to see the code for your 'brief test'
>> before anyone could meaningfully comment on your results.
>> But your results for a single test could have been a valid
>> "random" result."
>>
>> I've re-created what I did below. The problem appears to be
>> with the weighting process: the unweighted sample came out much
>> closer to the actual than the weighted sample (>1% error) did.
>> Comments?
>> Jim
>>
>>> x
>> [1] 1 2 3 4 5 6 7 8 9 10 11 12
>>> weights
>> [1] 1 1 1 1 1 1 2 2 2 2 2 2
>>
>>> a = mean(replicate(1000000,(sample(x, 3, prob = weights))));a # (1
>> million samples from x, of size 3, weighted by "weights"; the mean
>> should
>> be 7.50)
>> [1] 7.406977
>>> 7.406977/7.5
>> [1] 0.987597
>>
>>> b = mean(replicate(1000000,(sample(x, 3))));b # (1 million samples
>>> from
>> x, of size 3, not weighted this time; the mean should be 6.50)
>> [1] 6.501477
>>> 6.501477/6.5
>> [1] 1.000227
>>
>> Jim Bouldin, PhD
>
> To obtain the result you expected, you would need to explicitly
> specify "replace=TRUE", since the default for "replace" is FALSE.
> (though probably what you really intended was sampling without
> replacement).
>
> Read carefully what is said about "prob" in '?sample' -- when
> replace=FALSE, the probability of inclusion of an element is not
> proportional to its weight in 'prob'.
>
> The reason is that elements with higher weights are more likely
> to be chosen early on. This then knocks that higher weight out
> of the contest, making it more likely that elements with smaller
> weights will be sampled subsequently. Hence the mean of the sample
> will be biased slightly downwards, relative to the weighted mean
> of the values in x.
>
> table(replicate(1000000,(sample(x, 3))))
> # 1 2 3 4 5 6
> # 250235 250743 249603 250561 249828 249777
> # 7 8 9 10 11 12
> # 249780 250478 249591 249182 249625 250597
>
> (so all nice equal frequencies)
>
> table(replicate(1000000,(sample(x, 3,prob=weights))))
> # 1 2 3 4 5 6
> # 174873 175398 174196 174445 173240 174110
> # 7 8 9 10 11 12
> # 325820 326140 325289 325098 325475 325916
>
> Note that the frequencies of the values with weight=2 are a bit
> less than twice the frequencies of the values with weight=1:
>
> (325820+326140+325289+325098+325475+325916)/
> (174873+175398+174196+174445+173240+174110)
> # [1]
>
>
> In fact this is fairly easily caluclated. The possible combinations
> (in order of sampling) of the two weights, with their probabilities,
> are:
> 1s 2s
> -------------------------------------------
> 1 1 1 P = 6/18 * 5/17 * 4/16 3 0
> 1 1 2 P = 6/18 * 5/17 * 12/16 2 1
> 1 2 1 P = 6/18 * 12/17 * 5/15 2 1
> 1 2 2 P = 6/18 * 12/17 * 10/15 1 2
> 2 1 1 P = 12/18 * 6/16 * 5/15 2 1
> 2 1 2 P = 12/18 * 6/16 * 10/15 1 2
> 2 2 1 P = 12/18 * 10/16 * 6/14 1 2
> 2 2 2 P = 12/18 * 10/16 * 8/14 0 3
>
> So the expected number of "weight=1" in the sample is
>
> 3*(6/18 * 5/17 * 4/16) + 2*(6/18 * 5/17 * 12/16) +
> 2*(6/18 * 12/17 * 5/15) + 1*(6/18 * 12/17 * 10/15) +
> 2*(12/18 * 6/16 * 5/15) + 1*(12/18 * 6/16 * 10/15) +
> 1*(12/18 * 10/16 * 6/14) + 0
> = 1.046218
>
> Hence the expected number of "weight=2" in the sample is
>
> 3 - 1.046218 = 1.953782
>
> and their ratio 1.953782/1.046218 = 1.867471
>
> Compare this with the value 1.867351 (above) obtained by simulation!
>
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 23-Jul-09 Time: 21:05:07
> ------------------------------ XFMail ------------------------------
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Jul-09 Time: 21:15:52
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list