[Rd] range() for Date and POSIXct could respect `finite = TRUE`
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon May 22 11:20:36 CEST 2023
>>>>> Gabriel Becker
>>>>> on Fri, 19 May 2023 09:23:50 -0700 writes:
> Hi All,
> I think there may be some possible confusion about what allowsInf would be
> reporting (or maybe its just me :) ) if we did this.
> Consider a class "myclass", S3, for starters,
> with
> setMethod("allowsInf", "myclass", function(obj) FALSE)
> Then, what would
> myclassthing <- structure(1.5, class = "mything")
> myclassthing[1] <- Inf
> do.
Hmm.. You can always define classes and methods which jointly are
complete non-sense; e.g., people have also defined classes with a
length() method that returned a result of length 2, or classes
with a length but non-conforming `[` methods such that e.g.
x[length(x)] would fail.
The idea of allowsInf() / allows.infinite() would be that you
typically would *NOT* define a method for your class usually,
and only define {if you want S4}
setMethod("allowsInf", <myclass>, function(obj) TRUE)
for those cases where is.finite() and is.infinite() do work
sensibly for <myclass>, i.e., are vectorizing (and typically can
be both TRUE and FALSE)
*and* in case the default, i.e. allowsInf.default(),
is not already giving the correct result for <myclass> objects.
[.......]
> Put another way, and as pointed out by Bill above, the result of allowsInf
> is really an attribute of a *class*, not of an object.
Well, yes, that's true, but that's also true e.g., for is.numeric()
.. one of the starting points of this RFC.
> Its notable here that developers could also get around this by implementing
> methods for the summary group generic that either implement the finite
> argument or not as appropriate for their class, right? And that would be
> true whether the default for, e.g., min and max were altered to have the
> finite argument to match range, or not.
> Best,
> ~G
Sorry if I was confusing (much earlier):
I don't propose anymore --- at least not in this thread ---
that min() and max() should also get a 'finite = FALSE' optional argument.
This is about range(x, finite=TRUE) to work (the "same" as the
default method), e.g. when x inherits from "Date" or "POSIXct"
*and* to do so in a somewhat smart way so that R developers of
other similar classes could also easily make range(*,
finite=TRUE) "work" for their class objects.
Martin
> On Fri, May 19, 2023 at 8:30 AM Martin Maechler <maechler using stat.math.ethz.ch>
> wrote:
>> >>>>> Bill Dunlap
>> >>>>> on Thu, 11 May 2023 10:42:48 -0700 writes:
>>
>> >> What do others think?
>>
>> > I can imagine a class, "TemperatureKelvins", that wraps a
>> > double but would have a range of 0 to Inf or one called
>> > "GymnasticsScore" with a range of 0 to 10. For those
>> > sorts of things it would be nice to have a generic that
>> > gave the possible min and max for the class instead of one
>> > that just said they were -Inf and Inf or not.
>>
>> > -Bill
>>
>> yeah.. I agree that a general concept of such an interval class
>> is even more flexible and generally useful.
>> OTOH, people have already introduced such classes where they
>> were really needed, and here it's really about
>> *if*
>> is.finite() and is.infinite() are also available and working
>> but not always FALSE (which they are for logical, integer,
>> character *and* raw, the latter really debatable - but *not* in this
>> thread).
>>
>> So, allows.infinite(x) would *not* vectorize but return TRUE or
>> FALSE (and typically not NA ..), in some sense being a property
>> of class(x) only.
>>
>>
>> > On Thu, May 11, 2023 at 1:49 AM Martin Maechler
>> > <maechler using stat.math.ethz.ch> wrote:
>>
>> >> >>>>> Davis Vaughan
>> >> >>>>> on Tue, 9 May 2023 09:49:41 -0400 writes:
>> >>
>> >> > It seems like the main problem is that `is.numeric(x)`
>> >> > isn't fully indicative of whether or not `is.finite(x)`
>> >> > makes sense for `x` (i.e. Date isn't numeric but does
>> >> > allow infinite dates).
>> >>
>> >> > So I could also imagine a new `allows.infinite()` S3
>> >> > generic that would return a single TRUE/FALSE for whether
>> >> > or not the type allows infinite values, this would also be
>> >> > indicative of whether or not `is.finite()` and
>> >> > `is.infinite()` make sense on that type. I imagine it
>> >> > being used like:
>>
>> >> > ```
>> >> > allows.infinite <- function(x) {
>> >> > UseMethod("allows.infinite")
>> >> > }
>> >> > allows.infinite.default <- function(x) {
>> >> > is.numeric(x) # For backwards compatibility, maybe? Not sure.
>> >> > }
>>
>> it would have to include is.complex() as well *and*
>> in principle I'd want to *exclude* integers as they really
>> cannot be +/- Inf
>> ... but then you did say "not sure" ..
>>
>> I'm still somewhat favoring this proposal,
>> because it would be a bit more generally applicable
>> but still very simple.
>>
>> Personally, I'd go for the shorter allowsInf() name,
>> not adding another <word1>.<word2>() generic function,
>> but that's less important and should not determine decisions I think.
>>
>> Martin
>>
>> >> > allows.infinite.Date <- function(x) {
>> >> > TRUE
>> >> > }
>> >> > allows.infinite.POSIXct <- function(x) {
>> >> > TRUE
>> >> > }
>> >> >
>> >> > range.default <- function (..., na.rm = FALSE, finite = FALSE) {
>> >> > x <- c(..., recursive = TRUE)
>> >> > if (allows.infinite(x)) { # changed from `is.numeric()`
>> >> > if (finite)
>> >> > x <- x[is.finite(x)]
>> >> > else if (na.rm)
>> >> > x <- x[!is.na(x)]
>> >> > c(min(x), max(x))
>> >> > }
>> >> > else {
>> >> > if (finite)
>> >> > na.rm <- TRUE
>> >> > c(min(x, na.rm = na.rm), max(x, na.rm = na.rm))
>> >> > }
>> >> > }
>> >> > ```
>> >>
>> >> > It could allow other R developers to also use the pattern of:
>> >>
>> >> > ```
>> >> > if (allows.infinite(x)) {
>> >> > # conditionally do stuff with is.infinite(x)
>> >> > }
>> >> > ```
>> >>
>> >> > and that seems like it could be rather nice.
>> >>
>> >> > It would avoid the need for `range.Date()` and `range.POSIXct()`
>> >> methods too.
>> >>
>> >> > -Davis
>> >>
>> >> That *is* an interesting alternative perspective ...
>> >> sent just about before I was going to commit my proposal (incl
>> >> new help page entries, regr.tests ..).
>> >>
>> >> So we would introduce a new generic allows.infinite() {or
>> >> better name?, allowsInf, ..} with the defined semantic that
>> >>
>> >> allows.infinite(x) for a vector 'x' gives a logical "scalar",
>> >> TRUE iff it is known that is.finite(x) "makes sense" and
>> >> returns a logical vector of length length(x) .. which is TRUE
>> >> where x[i] is not NA/NaN/+Inf/-Inf .. *and*
>> >> is.infinite := Negate(is.finite) {or vice versa if you prefer}.
>> >>
>> >> I agree that this may be useful somewhat more generally than
>> >> just for range() methods.
>> >>
>> >> What do others think?
>> >>
>> >> Martin
>> >>
>> >>
>> >> > On Thu, May 4, 2023 at 5:29 AM Martin Maechler
>> >> > <maechler using stat.math.ethz.ch> wrote:
>> >> [......]
>> >>
>> >> >> >>>>> Davis Vaughan
>> >> >> >>>>> on Mon, 1 May 2023 08:46:33 -0400 writes:
>> >> >>
>> >> >> > Martin,
>> >> >> > Yes, I missed that those have `Summary.*` methods, thanks!
>> >> >>
>> >> >> > Tweaking those to respect `finite = TRUE` sounds great. It
>> seems
>> >> like
>> >> >> > it might be a little tricky since the Summary methods call
>> >> >> > `NextMethod()`, and `range.default()` uses `is.numeric()` to
>> >> determine
>> >> >> > whether or not to apply `finite`. Because `is.numeric.Date()`
>> is
>> >> >> > defined, that always returns `FALSE` for Dates (and POSIXt).
>> >> Because
>> >> >> > of that, it may still be easier to just write a specific
>> >> >> > `range.Date()` method, but I'm not sure.
>> >> >>
>> >> >> > -Davis
>> >> >>
>> >> >> I've looked more closely now, and indeed,
>> >> >> range() is the only function in the Summary group
>> >> >> where (only) the default method has a 'finite' argument.
>> >> >> which strikes me as somewhat asymmetric / inconsequential, as
>> >> >> after all, range(.) := c(min(.), max(.)) ,
>> >> >> but min() and max() do not obey an finite=TRUE setting, note
>> >> >>
>> >> >> > min(c(-Inf,3:5), finite=TRUE)
>> >> >> Error: attempt to use zero-length variable name
>> >> >>
>> >> >> where the error message also is not particularly friendly
>> >> >> and of course has nothing to with 'finite' :
>> >> >>
>> >> >> > max(1:4, foo="bar")
>> >> >> Error: attempt to use zero-length variable name
>> >> >> >
>> >> >>
>> >> >> ... but that is diverting; coming back to the topic: Given
>> >> >> that 'finite' only applies to range() {and there is just a
>> >> convenience},
>> >> >> I do agree that from my own work & support to make `Date` and
>> >> >> `POSIX(c)t` behave more number-like, it would be "nice" to have
>> >> >> range() obey a `finite=TRUE` also for these.
>> >> >>
>> >> >> OTOH, there are quite a few other 'number-like' thingies for
>> >> >> which I would then like to have range(*, finite=TRUE) work,
>> >> >> e.g., "mpfr" (package {Rmpfr}) or "bigz" {gmp} numbers, numeric
>> >> >> sparse matrices, ...
>> >> >>
>> >> >> To keep such methods all internally consistent with
>> >> >> range.default(), I could envision something like this
>> >> >>
>> >> >>
>> >> >> .rangeNum <- function(..., na.rm = FALSE, finite = FALSE,
>> isNumeric)
>> >> >> {
>> >> >> x <- c(..., recursive = TRUE)
>> >> >> if(isNumeric(x)) {
>> >> >> if(finite) x <- x[is.finite(x)]
>> >> >> else if(na.rm) x <- x[!is.na(x)]
>> >> >> c(min(x), max(x))
>> >> >> } else {
>> >> >> if(finite) na.rm <- TRUE
>> >> >> c(min(x, na.rm=na.rm), max(x, na.rm=na.rm))
>> >> >> }
>> >> >> }
>> >> >>
>> >> >> range.default <- function(..., na.rm = FALSE, finite = FALSE)
>> >> >> .rangeNum(..., na.rm=na.rm, finite=finite, isNumeric =
>> is.numeric)
>> >> >>
>> >> >> range.POSIXct <- range.Date <- function(..., na.rm = FALSE,
>> finite
>> >> = FALSE)
>> >> >> .rangeNum(..., na.rm=na.rm, finite=finite, isNumeric =
>> >> function(.)TRUE)
>> >> >>
>> >> >>
>> >> >>
>> >> >> which would also provide .rangeNum() to be used by implementors
>> >> >> of other numeric-like classes to provide their own range()
>> >> >> method as a 1-liner *and* be future-consistent with the default
>> >> method..
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> > On Sat, Apr 29, 2023 at 4:47 PM Martin Maechler
>> >> >> > <maechler using stat.math.ethz.ch> wrote:
>> >> >> >>
>> >> >> >> >>>>> Davis Vaughan via R-devel
>> >> >> >> >>>>> on Fri, 28 Apr 2023 11:12:27 -0400 writes:
>> >> >> >>
>> >> >> >> > Hi all,
>> >> >> >>
>> >> >> >> > I noticed that `range.default()` has a nice `finite =
>> >> >> >> > TRUE` argument, but it doesn't actually apply to Date or
>> >> >> >> > POSIXct due to how `is.numeric()` works.
>> >> >> >>
>> >> >> >> Well, I think it would / should never apply:
>> >> >> >>
>> >> >> >> range() belongs to the "Summary" group generics (as min, max,
>> >> ...)
>> >> >> >>
>> >> >> >> and there *are* Summary.Date() and Summary.POSIX{c,l}t()
>> >> methods.
>> >> >> >>
>> >> >> >> Without checking further for now, I think you are indirectly
>> >> >> >> suggesting to enhance these three Summary.*() methods so they
>> do
>> >> >> >> obey 'finite = TRUE' .
>> >> >> >>
>> >> >> >> I think I agree they should.
>> >> >> >>
>> >> >> >> Martin
>> >> >> >>
>> >> >> >> > ``` x <- .Date(c(0, Inf, 1, 2, Inf)) x #> [1] "1970-01-01"
>> >> >> >> > "Inf" "1970-01-02" "1970-01-03" "Inf"
>> >> >> >>
>> >> >> >> > # Darn! range(x, finite = TRUE) #> [1] "1970-01-01" "Inf"
>> >> >> >>
>> >> >> >> > # What I want .Date(range(unclass(x), finite = TRUE)) #>
>> >> >> >> > [1] "1970-01-01" "1970-01-03" ```
>> >> >> >>
>> >> >> >> > I think `finite = TRUE` would be pretty nice for Dates in
>> >> >> >> > particular.
>> >> >> >>
>> >> >> >> > As a motivating example, sometimes you have ranges of
>> >> >> >> > dates represented by start/end pairs. It is fairly natural
>> >> >> >> > to represent an event that hasn't ended yet with an
>> >> >> >> > infinite date. If you need to then compute a sequence of
>> >> >> >> > dates spanning the full range of the start/end pairs, it
>> >> >> >> > would be nice to be able to use `range(finite = TRUE)` to
>> >> >> >> > do so:
>> >> >> >>
>> >> >> >> > ``` start <- as.Date(c("2019-01-05", "2019-01-10",
>> >> >> >> > "2019-01-11", "2019-01-14")) end <-
>> >> >> >> > as.Date(c("2019-01-07", NA, "2019-01-14", NA))
>> >> >> >> > end[is.na(end)] <- Inf
>> >> >> >>
>> >> >> >> > # `end = Inf` means that the event hasn't "ended" yet
>> >> >> >> > data.frame(start, end) #> start end #> 1 2019-01-05
>> >> >> >> > 2019-01-07 #> 2 2019-01-10 Inf #> 3 2019-01-11 2019-01-14
>> >> >> >> > #> 4 2019-01-14 Inf
>> >> >> >>
>> >> >> >> > # Create a full sequence along all days in start/end range
>> >> >> >> > <- .Date(range(unclass(c(start, end)), finite = TRUE))
>> >> >> >> > seq(range[1], range[2], by = 1) #> [1] "2019-01-05"
>> >> >> >> > "2019-01-06" "2019-01-07" "2019-01-08" "2019-01-09" #> [6]
>> >> >> >> > "2019-01-10" "2019-01-11" "2019-01-12" "2019-01-13"
>> >> >> >> > "2019-01-14" ```
>> >> >> >>
>> >> >> >> > It seems like one option is to create a `range.Date()`
>> >> >> >> > method that unclasses, forwards the arguments on to a
>> >> >> >> > second call to `range()`, and then reclasses?
>> >> >> >>
>> >> >> >> > ``` range.Date <- function(x, ..., na.rm = FALSE, finite =
>> >> >> >> > FALSE) { .Date(range(unclass(x), na.rm = na.rm, finite =
>> >> >> >> > finite), oldClass(x)) } ```
>> >> >> >>
>> >> >> >> > This is similar to how `rep.Date()` works.
>> >> >> >>
>> >> >> >> > Thanks, Davis Vaughan
>> >> >> >>
>> >> >> >> > ______________________________________________
>> >> >> >> > R-devel using r-project.org mailing list
>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >> ______________________________________________
>> >> R-devel using r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>>
>> > [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> [[alternative HTML version deleted]]
More information about the R-devel
mailing list