[R] Zipf random number generation

Tue Jan 25 23:46:38 CET 2005

Sorry. 

c = sum(1.0 / pow((double) j, alpha)), j=1,2,...,N

 --- Weiguang Shi <wgshi2001 at yahoo.ca> wrote: 
> Thanks very much Ted for the detailed explanation.
> 
> It might be flawed but some "common practices" in my
> area use the following approach to generate a random
> rank for some Zipf-like distribution with the
> parameter alpha.
> 
> The key is to introduce the limit of ranks, N. With
> N
> and alpha, therefore, the probability of rank i is
> determined by 
>     c/pow((double) i, alpha)
> where
>     c = sum(1.0 / pow((double) i, alpha))
> 
> Any comment on that?
> 
> Thanks,
> Weiguang
> 
>  --- Ted.Harding at nessie.mcc.ac.uk wrote: 
> > On 25-Jan-05 Weiguang Shi wrote:
> > > Hi,
> > > 
> > > Is there a Zipf-like distribution RNG in R?
> > > 
> > > Thanks,
> > > Weiguang
> > 
> > "Zipf's Law" (as originally formulated in studies
> of
> > the
> > frequencies of words in texts) is to the effect
> that
> > the
> > relative frequencies with which words occur once,
> > twice,
> > 3 times, ... are in proportion to 1/1, 1/2, 1/3,
> ...
> > ,
> > 1/n, ... with no limit on n (i.e. the number of
> > different
> > words each represented n times is proportional to
> > 1/n).
> > 
> > This is "improper" since sum(1/n) = infinity, so
> > does not
> > define a probability distribution. A respectable
> > analogue
> > is Fisher's logarithmic distribution p(x) where
> > 
> >   p(x) = ((t^x)/x)/(-log(1-t)), x = 1, 2, 3, ...
> > 
> > where t in (0,1) is the parameter of the
> > distribution.
> > 
> > As t -> 1, p(x+1)/p(x) -> x/(x+1) as in Zipf's
> Law.
> > 
> > However, I've searched the R site and have found
> > only
> > one instance of a function directly related to
> this
> > distribution, namely logff() in package VGAM,
> which
> > is for estimating the parameter t.
> > 
> > So it looks as though there is no direct
> > implementation
> > of something like "rlogdist".
> > 
> > However, the logarithmic distribution is a
> limiting
> > form of the negative binomial distribution
> > (conditional
> > on a positive value), and there are functions in R
> > for
> > random sampling from this distribution.
> > 
> > From my reading of "?rnbinom" in the base package,
> > this
> > function does not have the flexibitility you would
> > need
> > for this. But in the MASS package there is the
> > function
> > rnegbin() which does.
> > 
> > You would need to invoke
> > 
> >   rnegbin(N, mu=..., theta=... )
> > 
> > where the value ... of mu is large and the value
> ...
> > of
> > theta is small, and reject cases with value zero.
> > This
> > of course makes it awkward to generate a sample of
> > given
> > size.
> > 
> > Alternatively, you can try along the lines of
> > 
> > > x0<-1:10000; t<-0.999;
> p<-((t^x0)/x0)/(-log(1-t))
> > > Y<-sample(x0,5000,replace=TRUE,prob=p)
> > > hist(Y,breaks=100)
> > 
> > While this gives the logarithmic distribution over
> > the
> > range of x in x0, it is inexact in that it does
> not
> > permit values greater than max(x0) to be sampled.
> > 
> > No doubt others can suggest something better than
> > this!
> > 
> > Best wishes,
> > Ted.
> > 
> > 
> >
>
--------------------------------------------------------------------
> > E-Mail: (Ted Harding)
> <Ted.Harding at nessie.mcc.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861  [NB: New
> number!]
> > Date: 25-Jan-05                                   
>  
> >  Time: 10:20:57
> > ------------------------------ XFMail
> > ------------------------------
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>