[Rd] hist(..., log="y")

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Aug 7 10:54:22 CEST 2023


>>>>> Ott Toomet 
>>>>>     on Sat, 5 Aug 2023 23:49:38 -0700 writes:

    > Sorry if this topic has been discussed earlier.

    > Currently, hist(..., log="y") fails with

    >> hist(rexp(1000, 1), log="y")
    > Warning messages: 1: In plot.window(xlim, ylim, "", ...) :
    > nonfinite axis=2 limits [GScale(-inf,2.59218,..);
    > log=TRUE] -- corrected now 2: In title(main = main, sub =
    > sub, xlab = xlab, ylab = ylab, ...) : "log" is not a
    > graphical parameter 3: In axis(1, ...) : "log" is not a
    > graphical parameter 4: In axis(2, at = yt, ...) : "log" is
    > not a graphical parameter

    > The same applies for log="x"

[...........]

    > This applies for the current svn version of R, and also a
    > few recent published versions.  This is unfortunate for
    > two reasons:

    > * the error message is not quite correct--"log" is a
    > graphical parameter, but "hist" does not support it.

No, not if you use R's (or S's before that) definition:

   graphical parameters := {the possible argument of par()}

log is *not* among these.


    > * for various kinds of data it is worthwhile to make
    > histograms in log scale.  "hist" is a very nice and
    > convenient function and support for log scale would be
    > handy here.

Yes, possibly (see below).
Note that the above are not errors, but warnings,
and there *is* some support, e.g.,

    > set.seed(1); range(x <- rlnorm(1111))
    [1]  0.04938796 45.16293285
    > hx <- hist(x, log="x", xlim=c(0.049, 47))
    Warning messages:
    1: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
      "log" is not a graphical parameter
    2: In axis(1, ...) : "log" is not a graphical parameter
    3: In axis(2, at = yt, ...) : "log" is not a graphical parameter

    > str(hx)
    List of 6
     $ breaks  : num [1:11] 0 5 10 15 20 25 30 35 40 45 ...
     $ counts  : int [1:10] 1041 58 10 0 1 0 0 0 0 1
     $ density : num [1:10] 0.1874 0.01044 0.0018 0 0.00018 ...
     $ mids    : num [1:10] 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5
     $ xname   : chr "x"
     $ equidist: logi TRUE
     - attr(*, "class")= chr "histogram"

where we see that it *does* plot  ... but crucially not the very first bin,
because log(0) == -Inf,  with over 90% (viz. 1041) counts.

    > I also played a little with the code, and it seems to be
    > very easy to implement.  I am happy to make a patch if the
    > team thinks it is worth pursuing.

    > Cheers, Ott

Yeah.. and that's is the important question.

Most statisticians know that a histogram is a pretty bad
density estimator (notably if the natural density has an
infinite support) compared to simple kernel density estimates,
e.g. those by density().
Hence, I'd argue that if you expect enough sophistication from
your "viewer"s to understand a log-scale histogram, I'd say you
should use a density with log="x" and or "y"  and I I have
successfully done so several times: It *does* work
{particularly nicely if you use my sfsmisc::eaxis() for the log axis/es}.

But you (and others) may have more good arguments why hist()
should work with log="x" and/or log="y"...

Also if your patch relatively small, its usefulness may 
outweigh the added complexity (and its long-term maintenance !).

Martin



More information about the R-devel mailing list