[R] non-linear binning? power-law in R
sdavis2 at mail.nih.gov
Wed Jun 16 12:32:38 CEST 2004
Is ?cut what you need?
On 6/16/04 6:52 AM, "Dan Bolser" <dmb at mrc-dunn.cam.ac.uk> wrote:
> First, thanks to everyone who helped me get to grips with R in (x)emacs
> (I get confused easily). Special thanks to Stephen Eglen for continued
> My question is about non-linear binning, or density functions over
> distributions governed by a power law ...
> y ~ mu*x**lambda # In one of its forms
> # (can't find Pareto in the online help)
> Looking at the following should show my problem....
> x3 <- runif(10000)**3 # Probably a better (correct) way to do this
> plot( density(x3,cut=0,bw=0.1))
> plot( density(x3,cut=0,bw=0.01))
> plot( density(x3,cut=0,bw=0.001))
> plot(density(x3,cut=0,bw=0.1), log='xy')
> plot(density(x3,cut=0,bw=0.01), log='xy')
> The upper three plots show that the bw has a big effect on the appearance
> of the graph by rescaling based on the initial density at low values of x,
> which is very high.
> The lower plots show (I think) an error in the use of linear bins to view
> a non linear trend. I would expect this curve to be linear on log-log
> scales (from experience), and you can see the expected behavior in the
> tails of these plots.
> If you play with drawing these curves on top of each other they look OK
> apart from at the beginning. However, changing the band width to 0.0001 has
> a radical effect on these plots, and they begin to show a different trend
> (look like they are being governed by a different power).
> x3log <- -log(x3)
> plot( density(x3log,cut=0,bw=0.5), log='y',col=1)
> lines(density(x3log,cut=0,bw=0.2), log='y',col=2)
> lines(density(x3log,cut=0,bw=0.1), log='y',col=3)
> lines(density(x3log,cut=0,bw=0.01), log='y',col=4)
> 'Real' data of this form is usually discrete, with the value of 1 being
> the most frequent (minimum) event, and higher values occurring less
> frequently according to a power (power-law). This data can be easily
> grouped into discrete bins, and frequency plotted on log scales. The
> continuous data generated above requires some form of density estimation
> or rescaling into discreet values (make the smallest value equal to 1 and
> round everything else into an integer).
> I see the aggregate function, but which function lets me simply count the
> number of values in a class (integer bin)?
> The analysis of even the discretized data is made more accurate by the use
> of exponentially growing bins. This way you don't need to plot the data on
> log scales, and the increasing variance associated with lower probability
> events is handled by the increasing bin size (giving good accuracy of
> power fitting). How can I easily (ignorantly) implement exponentially
> increasing bin sizes?
> Thanks for any feedback,
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help