[R] Zipf random number generation
Weiguang Shi
wgshi2001 at yahoo.ca
Tue Jan 25 23:46:38 CET 2005
Sorry.
c = sum(1.0 / pow((double) j, alpha)), j=1,2,...,N
--- Weiguang Shi <wgshi2001 at yahoo.ca> wrote:
> Thanks very much Ted for the detailed explanation.
>
> It might be flawed but some "common practices" in my
> area use the following approach to generate a random
> rank for some Zipf-like distribution with the
> parameter alpha.
>
> The key is to introduce the limit of ranks, N. With
> N
> and alpha, therefore, the probability of rank i is
> determined by
> c/pow((double) i, alpha)
> where
> c = sum(1.0 / pow((double) i, alpha))
>
> Any comment on that?
>
> Thanks,
> Weiguang
>
> --- Ted.Harding at nessie.mcc.ac.uk wrote:
> > On 25-Jan-05 Weiguang Shi wrote:
> > > Hi,
> > >
> > > Is there a Zipf-like distribution RNG in R?
> > >
> > > Thanks,
> > > Weiguang
> >
> > "Zipf's Law" (as originally formulated in studies
> of
> > the
> > frequencies of words in texts) is to the effect
> that
> > the
> > relative frequencies with which words occur once,
> > twice,
> > 3 times, ... are in proportion to 1/1, 1/2, 1/3,
> ...
> > ,
> > 1/n, ... with no limit on n (i.e. the number of
> > different
> > words each represented n times is proportional to
> > 1/n).
> >
> > This is "improper" since sum(1/n) = infinity, so
> > does not
> > define a probability distribution. A respectable
> > analogue
> > is Fisher's logarithmic distribution p(x) where
> >
> > p(x) = ((t^x)/x)/(-log(1-t)), x = 1, 2, 3, ...
> >
> > where t in (0,1) is the parameter of the
> > distribution.
> >
> > As t -> 1, p(x+1)/p(x) -> x/(x+1) as in Zipf's
> Law.
> >
> > However, I've searched the R site and have found
> > only
> > one instance of a function directly related to
> this
> > distribution, namely logff() in package VGAM,
> which
> > is for estimating the parameter t.
> >
> > So it looks as though there is no direct
> > implementation
> > of something like "rlogdist".
> >
> > However, the logarithmic distribution is a
> limiting
> > form of the negative binomial distribution
> > (conditional
> > on a positive value), and there are functions in R
> > for
> > random sampling from this distribution.
> >
> > From my reading of "?rnbinom" in the base package,
> > this
> > function does not have the flexibitility you would
> > need
> > for this. But in the MASS package there is the
> > function
> > rnegbin() which does.
> >
> > You would need to invoke
> >
> > rnegbin(N, mu=..., theta=... )
> >
> > where the value ... of mu is large and the value
> ...
> > of
> > theta is small, and reject cases with value zero.
> > This
> > of course makes it awkward to generate a sample of
> > given
> > size.
> >
> > Alternatively, you can try along the lines of
> >
> > > x0<-1:10000; t<-0.999;
> p<-((t^x0)/x0)/(-log(1-t))
> > > Y<-sample(x0,5000,replace=TRUE,prob=p)
> > > hist(Y,breaks=100)
> >
> > While this gives the logarithmic distribution over
> > the
> > range of x in x0, it is inexact in that it does
> not
> > permit values greater than max(x0) to be sampled.
> >
> > No doubt others can suggest something better than
> > this!
> >
> > Best wishes,
> > Ted.
> >
> >
> >
>
--------------------------------------------------------------------
> > E-Mail: (Ted Harding)
> <Ted.Harding at nessie.mcc.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861 [NB: New
> number!]
> > Date: 25-Jan-05
>
> > Time: 10:20:57
> > ------------------------------ XFMail
> > ------------------------------
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list