[Rd] a fault in the "hist" - function (PR#6931)
ripley at stats.ox.ac.uk
ripley at stats.ox.ac.uk
Wed Jun 9 09:56:03 CEST 2004
On 2 Jun 2004, Peter Dalgaard wrote:
> ligges at statistik.uni-dortmund.de writes:
>
> > The problem is in hist.default():
> >
> > diddle <- 1e-7 * max(abs(range(breaks)))
> >
> > and whereever we are diddling - there are some disadvantages.
> >
> > Do we want a flag that turns off diddling and the following "fuzz"
> > stuff? Or do we want something to adjust the hardcoded heuristical value
> > "1e-7" (to zero, for example)?
>
> Neither, I think, since the diddle is there for a reason, and the only
> real problem is the use of breaks that are wildly off-scale. We might
> key diddle to xlim instead, or possibly let "diddle" be an argument
We can't do that, as hist might not be used to plot.
> with a suitable default.
>
> You probably can't get all cases completely right though. A tiny range
> of numbers (compared to the mean) is likely to cause problems whatever
> you do.
I think the fuzz really needs to be relative to the adjacent bin size (and
the one to the left or right as appropriate). So I am going to replace
diddle <- 1e-7 * max(abs(range(breaks)))
by
diddle <- 1e-7 * median(diff(breaks))
that is to use a typical bin size to set the fuzz factor. (Note: I know
this is typically a bit smaller, but 1e-7 was a rather large tolerance.)
[I hadn't realized we used the largest limit and not the range (normal
sense) of the data. There is also something of a design error in that we
shift the breaks and not the data.]
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list