[R] non-linear binning? power-law in R

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Fri Jun 18 11:57:09 CEST 2004

On Thu, 17 Jun 2004, Dr. Herwig Meschke wrote:

>Why not try to avoid binning (and density plot) at all? An alternative 
>could be a qqplot (as a log-log-plot), e.g.
>plot(ppoints(length(x4)), x4[order(x4)], log="xy")
>abline(lm(log(x4[order(x4)])~log(ppoints(length(x4)))), col="red")
>If the assumptions of uniform distribution and power transformation 
>y=a*x**b are true, the coefficient of lm estimates the exponent b.

Thanks, this looks very cool (although I am going to have to learn what it
all means ;)

However, playing with the above with the following for example...

x4 <- runif(100)**4

plot(ppoints(length(x4)), x4[order(x4)], log="xy")
abline(lm(log(x4[order(x4)])~log(ppoints(length(x4)))), col="red")

Shows (perhaps after a few repeats) that the fitted curve is dominated by
the rare events, and the rare events have the highest variance, leading to
potential big errors. 

By uniformly binning the log transformed data you group the rarest values
in the bigest bin, and can therefore get better estimates of the true
slope of the curve. 

My problem is now a technical one of working out how to do this, so isn't
too fundamental. 

I can post up the differences in the values (and error) of the estimated
curves when I get round to doing this.

Thanks again for the help,



More information about the R-help mailing list