[R] non-linear binning? power-law in R
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Wed Jun 16 17:07:05 CEST 2004
On Wed, 16 Jun 2004, Roger D. Peng wrote:
>You can use hist(x, br, plot = FALSE)$counts.
>
>-roger
OK, I made this...
y4 <- runif(100000)**4
y4log <- -log(y4)
y4bin20 <- hist(y4log,20,plot=FALSE)$counts
y4bin20log <- log(y4bin20)
plot(y4bin20log)
for(i in 2:20){
y4 <- runif(100000)**4
y4log <- -log(y4)
y4bin20 <- hist(y4log,20,plot=FALSE)$counts
y4bin20log <- log(y4bin20)
points(y4bin20log,col=i)
}
I am seeing something very strange about 1 loop in every five or ten of
the above...
Can someone try this and let me know if every fifth run or so they see a
totally different trend? Basically looks like only 10 bins are being used
at random during the run!?
Hopefully you will also see what I am trying to do, which is to highlight
the increased variance in the tail...
How would I estimate the slope of y4bin20log? (sorry for the names).
Thanks for the help,
Dan.
>
>Dan Bolser wrote:
>> First, thanks to everyone who helped me get to grips with R in (x)emacs
>> (I get confused easily). Special thanks to Stephen Eglen for continued
>> support.
>>
>> My question is about non-linear binning, or density functions over
>> distributions governed by a power law ...
>>
>> y ~ mu*x**lambda # In one of its forms
>> # (can't find Pareto in the online help)
>>
>> Looking at the following should show my problem....
>>
>> x3 <- runif(10000)**3 # Probably a better (correct) way to do this
>>
>> plot( density(x3,cut=0,bw=0.1))
>> plot( density(x3,cut=0,bw=0.01))
>> plot( density(x3,cut=0,bw=0.001))
>>
>> plot(density(x3,cut=0,bw=0.1), log='xy')
>> plot(density(x3,cut=0,bw=0.01), log='xy')
>> plot(density(x3,cut=0,bw=0.001),log='xy')
>>
>> The upper three plots show that the bw has a big effect on the appearance
>> of the graph by rescaling based on the initial density at low values of x,
>> which is very high.
>>
>> The lower plots show (I think) an error in the use of linear bins to view
>> a non linear trend. I would expect this curve to be linear on log-log
>> scales (from experience), and you can see the expected behavior in the
>> tails of these plots.
>>
>> If you play with drawing these curves on top of each other they look OK
>> apart from at the beginning. However, changing the band width to 0.0001 has
>> a radical effect on these plots, and they begin to show a different trend
>> (look like they are being governed by a different power).
>>
>> Hmmm....
>>
>> x3log <- -log(x3)
>>
>> plot( density(x3log,cut=0,bw=0.5), log='y',col=1)
>>
>> lines(density(x3log,cut=0,bw=0.2), log='y',col=2)
>> lines(density(x3log,cut=0,bw=0.1), log='y',col=3)
>> lines(density(x3log,cut=0,bw=0.01), log='y',col=4)
>>
>> Sorry...
>>
>>
>> 'Real' data of this form is usually discrete, with the value of 1 being
>> the most frequent (minimum) event, and higher values occurring less
>> frequently according to a power (power-law). This data can be easily
>> grouped into discrete bins, and frequency plotted on log scales. The
>> continuous data generated above requires some form of density estimation
>> or rescaling into discreet values (make the smallest value equal to 1 and
>> round everything else into an integer).
>>
>> I see the aggregate function, but which function lets me simply count the
>> number of values in a class (integer bin)?
>>
>> The analysis of even the discretized data is made more accurate by the use
>> of exponentially growing bins. This way you don't need to plot the data on
>> log scales, and the increasing variance associated with lower probability
>> events is handled by the increasing bin size (giving good accuracy of
>> power fitting). How can I easily (ignorantly) implement exponentially
>> increasing bin sizes?
>>
>> Thanks for any feedback,
>>
>> Dan.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
>
>
More information about the R-help
mailing list