[R] How to generate integers from uniform distribution with
(Ted Harding)
Ted.Harding at manchester.ac.uk
Sat Sep 4 22:54:04 CEST 2010
On 04-Sep-10 19:27:54, Yi wrote:
> Enh, I see.
> It totally makes sense.
> Thank you for your perfect explanation.
> Enjoy the long weekend~
> Yi
You're welcome! Earlier I tried an experiment with rejection
sampling, which seems to work well for the case where you want
mean of the sampled values to exactly be the mean of the range
being sampled from. The number of tries, even for a large sample,
was lower than I had anticipated. Example (sample size = 20000,
sampled range is {-3,-2,-1,0,1,2,3}, mean = 0 therefore require
that sum of sample = 0):
S <- (-10) ; n <- 0
P <- ((-3):3)
while((S != 0)){
x <- sample(P,20000,replace=TRUE,prob=c(1,1,1,1,1,1,1))
S <- sum(x) ; n <- (n+1)
}
n
hist(x)
To get your case of sampling the integers from (17:23) with
sample mean always exactly 20, simply add 20 to the result x
of the above loop.
I found that I got values of n like:
126, 43, 403, 811, 385, 568, 590, 1758, 317, 456, 643, ...
with every run being completed well within 2 seconds.
Conditioning the value of the sum to be equal to the central
value (0) of the range is also conditioning the value to be
equal to the most probable value of the sum, so the runs will
on average be shortest. Conditioning on a different mean
(say mean = +1 for a sample of size 20000, so sum = 20000)
would take much much longer (see below).
One can use the Normal approximation to the distribution of
the sum to estimate how long it might take. One value sampled
from ((-3):3) has mean 0 and variance 4.66.. , so the sum of
20000 has mean 0 and variance 93333.33, hence the probability
that the sum will be 0 is approximated by
pnorm(0.5,0,sqrt(93333.33)) - pnorm(-0.5,0,sqrt(93333.33))
= 0.001305845
so the probability of success with one sample of 20000 is
about 1/766 (which is consistent with the above results for n).
On the other hand, conditioning on the mean of x being 1,
i.e. on the sum being 20000, the chance of success is
pnorm(20000.5,0,sqrt(93333.33)) - pnorm(19999.5,0,sqrt(93333.33))
which R computes as zero! Hence you have practically no chance
of achieving this within any reasonable time. However, of course,
the SE of the mean is sqrt((sum(P^2)/6)/20000) = 0.01527525,
so you are aiming at a point which is about 60 SEs from the mean.
The numbers are more reasonable if, instead of conditioning on
the mean, you condition on the sum (not too far from 0), e.g.
with sample size 20000 as before:
1 Sum must be 50, Prob(success) =
pnorm(50.5,0,sqrt(93333.33)) - pnorm(49.5,0,sqrt(93333.33))
= 0.001288472 ~= 1/776
2 Sum must be 100, Prob(success) =
pnorm(100.5,0,sqrt(93333.33)) - pnorm(99.5,0,sqrt(93333.33))
= 0.001237729 ~= 1/808
3 Sum must be 200, Prob(success) =
pnorm(200.5,0,sqrt(93333.33)) - pnorm(199.5,0,sqrt(93333.33))
= 0.001053971 ~= 1/949
4 Sum must be 500, Prob(success) =
pnorm(500.5,0,sqrt(93333.33)) - pnorm(499.5,0,sqrt(93333.33))
= 0.0003421745 ~= 1/2922
and so on. So even aiming at 500 it would on average only take
about 3000 tries to hit it. After that it rapidly becomes less likely.
Ted.
On 04-Sep-10 19:27:54, Yi wrote:
> Enh, I see.
> It totally makes sense.
> Thank you for your perfect explanation.
> Enjoy the long weekend~
> Yi
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Sep-10 Time: 21:53:58
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list