[Rd] a fault in the "hist" - function (PR#6931)

ripley at stats.ox.ac.uk ripley at stats.ox.ac.uk
Wed Jun 9 09:56:03 CEST 2004


On 2 Jun 2004, Peter Dalgaard wrote:

> ligges at statistik.uni-dortmund.de writes:
> 
> > The problem is in hist.default():
> > 
> >      diddle <- 1e-7 * max(abs(range(breaks)))
> > 
> > and whereever we are diddling - there are some disadvantages.
> > 
> > Do we want a flag that turns off diddling and the following "fuzz" 
> > stuff? Or do we want something to adjust the hardcoded heuristical value 
> > "1e-7" (to zero, for example)?
> 
> Neither, I think, since the diddle is there for a reason, and the only
> real problem is the use of breaks that are wildly off-scale. We might
> key diddle to xlim instead, or possibly let "diddle" be an argument

We can't do that, as hist might not be used to plot.

> with a suitable default. 
> 
> You probably can't get all cases completely right though. A tiny range
> of numbers (compared to the mean) is likely to cause problems whatever
> you do.

I think the fuzz really needs to be relative to the adjacent bin size (and 
the one to the left or right as appropriate).  So I am going to replace

    diddle <- 1e-7 * max(abs(range(breaks)))

by

    diddle <- 1e-7 * median(diff(breaks))

that is to use a typical bin size to set the fuzz factor.  (Note: I know
this is typically a bit smaller, but 1e-7 was a rather large tolerance.)

[I hadn't realized we used the largest limit and not the range (normal 
sense) of the data.  There is also something of a design error in that we 
shift the breaks and not the data.]

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list