[R] plotting Zipf and Zipf-Mandelbrot curves in R

Stefan Evert stefanML at COLLOCATIONS.DE
Mon Oct 18 14:51:25 CEST 2010


> Using R, I plotted a log-log plot of the frequencies in the Brown Corpus
> using
> plot(sort(file.tfl$f, decreasing=TRUE), xlab="rank", ylab="frequency",
> log="x,y")
> However, I would also like to add lines showing the curves for a Zipfian
> distribution and for Zipf-Mandelbrot.

It's fairly straightforward to add such curves to the plot above with lines(), e.g. for Zipf-Mandelbrot

  k <- 1:length(file.tfl$f)
  f <- C / (k + b)^a  # Zipf-Mandelbrot law with parameters a >= 1, b >= 0, C
  lines(k, f, lwd=2, col="red")

The tricky part is to determine suitable values for the parameters a, b and C.  If you happen to be using the "zipfR" package (just guessing because of the .tfl terminology in your code example), you can easily get an approximation to the Zipf-Mandelbrot law from a trained ZM model (the package does not offer a valid LNRE model for Zipf's original law).  In essence, this is what you have to do:

  file.zm <- lnre("zm", tfl2spc(file.tfl))  # assuming that file.tfl is a "tfl" object created by zipfR
  k <- 1:length(file.tfl$f)
  f <- tqlnre(file.zm, k) * N(file.tfl)
  lines(k, f, lwd=2, col="red")

Hope this helps,

More information about the R-help mailing list