[R] non-linear binning? power-law in R

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Wed Jun 16 17:07:05 CEST 2004


On Wed, 16 Jun 2004, Roger D. Peng wrote:

>You can use hist(x, br, plot = FALSE)$counts.
>
>-roger

OK, I made this...


y4 <- runif(100000)**4
y4log <- -log(y4)
y4bin20 <- hist(y4log,20,plot=FALSE)$counts
y4bin20log <- log(y4bin20)

plot(y4bin20log)

for(i in 2:20){
 y4 <- runif(100000)**4
 y4log <- -log(y4)
 y4bin20 <- hist(y4log,20,plot=FALSE)$counts
 y4bin20log <- log(y4bin20)
 points(y4bin20log,col=i)
}


I am seeing something very strange about 1 loop in every five or ten of
the above...

Can someone try this and let me know if every fifth run or so they see a
totally different trend? Basically looks like only 10 bins are being used
at random during the run!?

Hopefully you will also see what I am trying to do, which is to highlight
the increased variance in the tail...

How would I estimate the slope of y4bin20log? (sorry for the names).

Thanks for the help,
Dan.

>
>Dan Bolser wrote:
>> First, thanks to everyone who helped me get to grips with R in (x)emacs
>> (I get confused easily). Special thanks to Stephen Eglen for continued
>> support.
>> 
>> My question is about non-linear binning, or density functions over
>> distributions governed by a power law ...
>> 
>> y ~ mu*x**lambda	# In one of its forms 
>>                         # (can't find Pareto in the online help)
>> 
>> Looking at the following should show my problem....
>> 
>> x3 <- runif(10000)**3	# Probably a better (correct) way to do this
>> 
>> plot( density(x3,cut=0,bw=0.1))
>> plot( density(x3,cut=0,bw=0.01))
>> plot( density(x3,cut=0,bw=0.001))
>> 
>> plot(density(x3,cut=0,bw=0.1),  log='xy')
>> plot(density(x3,cut=0,bw=0.01), log='xy')
>> plot(density(x3,cut=0,bw=0.001),log='xy')
>> 
>> The upper three plots show that the bw has a big effect on the appearance
>> of the graph by rescaling based on the initial density at low values of x,
>> which is very high.
>> 
>> The lower plots show (I think) an error in the use of linear bins to view
>> a non linear trend. I would expect this curve to be linear on log-log
>> scales (from experience), and you can see the expected behavior in the
>> tails of these plots.
>> 
>> If you play with drawing these curves on top of each other they look OK
>> apart from at the beginning. However, changing the band width to 0.0001 has
>> a radical effect on these plots, and they begin to show a different trend
>> (look like they are being governed by a different power).
>> 
>> Hmmm....
>> 
>> x3log <- -log(x3)
>> 
>> plot( density(x3log,cut=0,bw=0.5),  log='y',col=1)
>> 
>> lines(density(x3log,cut=0,bw=0.2),  log='y',col=2)
>> lines(density(x3log,cut=0,bw=0.1),  log='y',col=3)
>> lines(density(x3log,cut=0,bw=0.01), log='y',col=4)
>> 
>> Sorry...
>> 
>> 
>> 'Real' data of this form is usually discrete, with the value of 1 being
>> the most frequent (minimum) event, and higher values occurring less
>> frequently according to a power (power-law). This data can be easily
>> grouped into discrete bins, and frequency plotted on log scales. The
>> continuous data generated above requires some form of density estimation
>> or rescaling into discreet values (make the smallest value equal to 1 and
>> round everything else into an integer).
>> 
>> I see the aggregate function, but which function lets me simply count the
>> number of values in a class (integer bin)?
>> 
>> The analysis of even the discretized data is made more accurate by the use
>> of exponentially growing bins. This way you don't need to plot the data on
>> log scales, and the increasing variance associated with lower probability
>> events is handled by the increasing bin size (giving good accuracy of
>> power fitting). How can I easily (ignorantly) implement exponentially
>> increasing bin sizes?
>> 
>> Thanks for any feedback,
>> 
>> Dan.
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>> 
>
>




More information about the R-help mailing list