[Rd] cut.Date and cut.POSIXt problem
Roger D. Peng
rpeng at jhsph.edu
Wed Mar 26 14:18:19 CET 2008
I have applied these patches to R-devel and in my limited testing they appear to
work as desired. I have to say that I never ran into the problem these patches
were meant to solve so I may not be the best person to do the testing.
-roger
Marc Schwartz wrote:
> Hi all,
>
> Apologies for the delay in my engaging in this thread. I was traveling
> this week.
>
> The problem that Gabor raised was caused by the patch that I submitted
> to fix a problem with the referenced functions when using 'months' and
> 'years' as the interval. The prior versions were problematic:
>
> https://stat.ethz.ch/pipermail/r-devel/2008-January/048004.html
>
> The patch fixed the error, but since I used hist.Date() as the reference
> model and did not note the subtle difference in cut.Date() relative to
> specifying the breaks increment value, this functionality was lost when
> the same modification was made to the code in cut.Date().
>
> Roger's patch helps, but does not totally remedy the situation. One also
> needs to modify the method used for specifying the max value 'end' for
> the breaks in order to include the max 'x' Date value in the result.
>
> Hence, I am attaching proposed patches against R-devel for
> base:::dates.R and base:::datetime.R.
>
> I am also attaching a patch for tests:::reg-tests-1.R to add a check for
> this situation to the regression tests that were also added subsequent
> to that prior set of patches that I had submitted.
>
> If perhaps Roger and Gabor could so some testing on these patches before
> they are considered for inclusion into the R-devel tree, it would be
> helpful to check to see if I have missed something else here.
>
> Thanks for raising this issue.
>
> Regards,
>
> Marc Schwartz
>
> Roger D. Peng wrote:
>> Seems changes in r44116 force the interval to be single months (or
>> years) instead of whatever the user specified. I think the attached
>> patches correct this.
>>
>> Interestingly, 'cut' and 'seq' allow for the 'breaks' specification to
>> be something like "3 months" but the documentation for 'hist' does not
>> allow for this type of specification.
>>
>> -roger
>>
>> Gabor Grothendieck wrote:
>>> cut.Date and cut.POSIXt indicate that the breaks argument
>>> can be an integer followed by a space followed by "year", etc.
>>> but it seems the integer is ignored.
>>>
>>> For example, I assume that breaks = "3 months" is supposed
>>> to cut it into quarters but, in fact, it cuts it into months as if
>>> 3 had not been there.
>>>
>>>> d <- seq(Sys.Date(), length = 12, by = "month")
>>>> cut(d, "3 months")
>>> [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
>>> 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
>>> Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
>>> 2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
>>> 2009-02-01
>>>> cut(as.POSIXct(d), "3 months")
>>> [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
>>> 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
>>> Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
>>> 2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
>>> 2009-02-01
>>>> cut(as.POSIXlt(d), "3 months")
>>> [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
>>> 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
>>> Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
>>> 2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
>>> 2009-02-01
>>>
>
> ------------------------------------------------------------------------
>
> --- datesORIG.R 2008-03-20 14:25:13.000000000 -0500
> +++ dates.R 2008-03-20 14:38:21.000000000 -0500
> @@ -322,17 +322,19 @@
> if(valid == 3) {
> start$mday <- 1
> end <- as.POSIXlt(max(x, na.rm = TRUE))
> - end <- as.POSIXlt(end + (31 * 86400))
> + step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> + end <- as.POSIXlt(end + (31 * step * 86400))
> end$mday <- 1
> - breaks <- as.Date(seq(start, end, "months"))
> + breaks <- as.Date(seq(start, end, breaks))
> } else if(valid == 4) {
> start$mon <- 0
> start$mday <- 1
> end <- as.POSIXlt(max(x, na.rm = TRUE))
> - end <- as.POSIXlt(end + (366 * 86400))
> + step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> + end <- as.POSIXlt(end + (366 * step * 86400))
> end$mon <- 0
> end$mday <- 1
> - breaks <- as.Date(seq(start, end, "years"))
> + breaks <- as.Date(seq(start, end, breaks))
> } else {
> start <- .Internal(POSIXlt2Date(start))
> if (length(by2) == 2) incr <- incr * as.integer(by2[1])
>
>
> ------------------------------------------------------------------------
>
> --- datetimeORIG.R 2008-03-20 14:25:20.000000000 -0500
> +++ datetime.R 2008-03-20 15:25:49.000000000 -0500
> @@ -727,17 +727,19 @@
> if(valid == 6) {
> start$mday <- 1
> end <- as.POSIXlt(max(x, na.rm = TRUE))
> - end <- as.POSIXlt(end + (31 * 86400))
> + step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> + end <- as.POSIXlt(end + (31 * step * 86400))
> end$mday <- 1
> - breaks <- seq(start, end, "months")
> + breaks <- seq(start, end, breaks)
> } else if(valid == 7) {
> start$mon <- 0
> start$mday <- 1
> end <- as.POSIXlt(max(x, na.rm = TRUE))
> - end <- as.POSIXlt(end + (366 * 86400))
> + step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> + end <- as.POSIXlt(end + (366 * step* 86400))
> end$mon <- 0
> end$mday <- 1
> - breaks <- seq(start, end, "years")
> + breaks <- seq(start, end, breaks)
> } else {
> if (length(by2) == 2) incr <- incr * as.integer(by2[1])
> maxx <- max(x, na.rm = TRUE)
>
>
> ------------------------------------------------------------------------
>
> --- reg-tests-1ORIG.R 2008-03-20 09:18:19.000000000 -0500
> +++ reg-tests-1.R 2008-03-20 15:15:56.000000000 -0500
> @@ -5025,7 +5025,7 @@
> ## was about 0.0005 in 2.6.1 patched
>
>
> -## tests of problems fixed by Marc Schwarz's patch for
> +## tests of problems fixed by Marc Schwartz's patch for
> ## cut/hist for Dates and POSIXt
> Dates <- seq(as.Date("2005/01/01"), as.Date("2009/01/01"), "day")
> months <- format(Dates, format = "%m")
> @@ -5036,20 +5036,32 @@
> stopifnot(identical(hist(Dates, "month", plot = FALSE)$counts, mn))
> # Test cut.Date() for months
> stopifnot(identical(as.vector(table(cut(Dates, "month"))), mn))
> +# Test cut.Date() for 3 months
> +stopifnot(identical(as.vector(table(cut(Dates, "3 months"))),
> + as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
> # Test hist.Date() for years
> stopifnot(identical(hist(Dates, "year", plot = FALSE)$counts, ty))
> # Test cut.Date() for years
> stopifnot(identical(as.vector(table(cut(Dates, "years"))),ty))
> +# Test cut.Date() for 3 years
> +stopifnot(identical(as.vector(table(cut(Dates, "3 years"))),
> + as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
>
> Dtimes <- as.POSIXlt(Dates)
> # Test hist.POSIXt() for months
> stopifnot(identical(hist(Dtimes, "month", plot = FALSE)$counts, mn))
> # Test cut.POSIXt() for months
> stopifnot(identical(as.vector(table(cut(Dtimes, "month"))), mn))
> +# Test cut.POSIXt() for 3 months
> +stopifnot(identical(as.vector(table(cut(Dtimes, "3 months"))),
> + as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
> # Test hist.POSIXt() for years
> stopifnot(identical(hist(Dtimes, "year", plot = FALSE)$counts, ty))
> # Test cut.POSIXt() for years
> stopifnot(identical(as.vector(table(cut(Dtimes, "years"))), ty))
> +# Test cut.POSIXt() for 3 years
> +stopifnot(identical(as.vector(table(cut(Dtimes, "3 years"))),
> + as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
> ## changed in 2.6.2
>
>
--
Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
More information about the R-devel
mailing list