[R] non-linear binning? power-law in R

Wed Jun 16 14:51:03 CEST 2004

You can use hist(x, br, plot = FALSE)$counts.

-roger

Dan Bolser wrote:
> First, thanks to everyone who helped me get to grips with R in (x)emacs
> (I get confused easily). Special thanks to Stephen Eglen for continued
> support.
> 
> My question is about non-linear binning, or density functions over
> distributions governed by a power law ...
> 
> y ~ mu*x**lambda	# In one of its forms 
>                         # (can't find Pareto in the online help)
> 
> Looking at the following should show my problem....
> 
> x3 <- runif(10000)**3	# Probably a better (correct) way to do this
> 
> plot( density(x3,cut=0,bw=0.1))
> plot( density(x3,cut=0,bw=0.01))
> plot( density(x3,cut=0,bw=0.001))
> 
> plot(density(x3,cut=0,bw=0.1),  log='xy')
> plot(density(x3,cut=0,bw=0.01), log='xy')
> plot(density(x3,cut=0,bw=0.001),log='xy')
> 
> The upper three plots show that the bw has a big effect on the appearance
> of the graph by rescaling based on the initial density at low values of x,
> which is very high.
> 
> The lower plots show (I think) an error in the use of linear bins to view
> a non linear trend. I would expect this curve to be linear on log-log
> scales (from experience), and you can see the expected behavior in the
> tails of these plots.
> 
> If you play with drawing these curves on top of each other they look OK
> apart from at the beginning. However, changing the band width to 0.0001 has
> a radical effect on these plots, and they begin to show a different trend
> (look like they are being governed by a different power).
> 
> Hmmm....
> 
> x3log <- -log(x3)
> 
> plot( density(x3log,cut=0,bw=0.5),  log='y',col=1)
> 
> lines(density(x3log,cut=0,bw=0.2),  log='y',col=2)
> lines(density(x3log,cut=0,bw=0.1),  log='y',col=3)
> lines(density(x3log,cut=0,bw=0.01), log='y',col=4)
> 
> Sorry...
> 
> 
> 'Real' data of this form is usually discrete, with the value of 1 being
> the most frequent (minimum) event, and higher values occurring less
> frequently according to a power (power-law). This data can be easily
> grouped into discrete bins, and frequency plotted on log scales. The
> continuous data generated above requires some form of density estimation
> or rescaling into discreet values (make the smallest value equal to 1 and
> round everything else into an integer).
> 
> I see the aggregate function, but which function lets me simply count the
> number of values in a class (integer bin)?
> 
> The analysis of even the discretized data is made more accurate by the use
> of exponentially growing bins. This way you don't need to plot the data on
> log scales, and the increasing variance associated with lower probability
> events is handled by the increasing bin size (giving good accuracy of
> power fitting). How can I easily (ignorantly) implement exponentially
> increasing bin sizes?
> 
> Thanks for any feedback,
> 
> Dan.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng