[R] Bug in hist() when working with Dates ?

Martin Maechler maechler at stat.math.ethz.ch
Tue Jun 2 10:05:21 CEST 2009


>>>>> "SN" == S Nunes <snunes at gmail.com>
>>>>>     on Mon, 1 Jun 2009 16:45:54 +0100 writes:

    SN> Hi, It seems that hist() has a buggy behavior when
    SN> breaking over "days".  The bug can be reproduced in a
    SN> few steps:

    >> d=data.frame(date=c("2009-01-01", "2009-01-02",
    >> "2009-01-02")) d$date=as.Date(d$date) d$date
    SN> [1] "2009-01-01" "2009-01-02" "2009-01-02"

    >> h=hist(d$date, "days") h$count
    SN> [1] 3

much simpler and less confusing is not going via data frame
(and 'plot=FALSE' suppresses the plot which is not the issue here) :

d. <- as.Date(c("2009-01-01", "2009-01-02", "2009-01-02"))
str(h <- hist(d., "days", plot=FALSE))

This does give what you observe.
It is not a bug as it is consistent with the default histogram
behavior:

> str(hist(c(1,2,2), breaks=1:2, plot=FALSE))
List of 7
 $ breaks     : int [1:2] 1 2
 $ counts     : int 3
 $ intensities: num 1
 $ density    : num 1
 $ mids       : num 1.5
 $ xname      : chr "c(1, 2, 2)"
 $ equidist   : logi TRUE
 - attr(*, "class")= chr "histogram"

    SN> Despite the fact that the original data contains 2
    SN> distinct days. The call hist() only returns one "break",
    SN> adding the occurrences of both days. I would expect the
    SN> last output to be: [1] 1 2.

as you see, your expectation is wrong.
It may help if you  also use  cut() (and read its help page)
and study the behavior of 
'include.lowest' and 'right' arguments to both cut and hist.

    SN> I am using R version 2.9.0.

    SN> I would like to know if this behavior is correct or a
    SN> bug?
"correct", as said above.

    SN> Thanks in advance for your comments on this issue,

I agree that it may be useful if hist.Date(), the method used here,
would allow to easily produce what you expected,
when you'd use

  str(h <- hist(d., "days", include.lowest=FALSE))

{which gives an error now},
namely to effectively use what you now can get via

  str(hist(d., breaks= seq(min(d.)-1, max(d.), "days")))

{hint: the above is the solution for your problem}

(make check-devel) tested patches to hist.Date()
[in src/library/graphics/R/datetime.R] are welcome.

Martin Maechler, ETH Zurich

    SN> -- Sergio Nunes




More information about the R-help mailing list