[Rd] Surprising length() of POSIXlt vector (PR#14073)
maechler at stat.math.ethz.ch
maechler at stat.math.ethz.ch
Mon Nov 30 14:10:45 CET 2009
>>>>> Tony Plate <tplate at acm.org>
>>>>> on Sun, 22 Nov 2009 10:21:33 -0600 writes:
> maechler at stat.math.ethz.ch wrote:
>>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>>> on Fri, 20 Nov 2009 09:54:34 +0100 writes:
>>>>>>>
>>
PD> mark at celos.net wrote:
>> >> Arrays of POSIXlt dates always return a length of 9. This
>> >> is correct (they're really lists of vectors of seconds,
>> >> hours, and so forth), but other methods disguise them as
>> >> flat vectors, giving superficially surprising behaviour:
>> >>
>> >> strings <- paste('2009-1-', 1:31, sep='')
>> >> dates <- strptime(strings, format="%Y-%m-%d")
>> >>
>> >> print(dates)
>> >> # [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05"
>> >> # [6] "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10"
>> >> # [11] "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" "2009-01-15"
>> >> # [16] "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20"
>> >> # [21] "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25"
>> >> # [26] "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30"
>> >> # [31] "2009-01-31"
>> >>
>> >> print(length(dates))
>> >> # [1] 9
>> >>
>> >> str(dates)
>> >> # POSIXlt[1:9], format: "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" ...
>> >>
>> >> print(dates[20])
>> >> # [1] "2009-01-20"
>> >>
>> >> print(length(dates[20]))
>> >> # [1] 9
>> >>
>> >> I've since realised that POSIXct makes date vectors easier,
>> >> but could we also have something like:
>> >>
>> >> length.POSIXlt <- function(x) { length(x$sec) }
>> >>
>> >> in datetime.R, to avoid breaking functions (like the
>> >> str.POSIXt method) which use length() in this way?
>>
>>
PD> [You need "wishlist" in the title for this sort of stuff.]
>>
PD> I'd be wary of this. Just the other day we found that identical() broke
PD> on some objects because a package had length() redefined as a class
PD> method. I.e. the danger is that something wants to use length() with its
PD> original low-level interpretation.
>>
>> Yes, of course.
>> and Romain mentioned str(). Note that we have needed to define
>> a "POSIXt" method for str(), partly just *because* of the
>> current anomaly:
>> As Tony Plate, e.g., has argued, entirely correctly in my view,
>> the anomaly is that length() and "[" are not compatible;
>> and while I think no R language definition says that they should
>> be, I still believe that you need very good reasons for them to
>> be incompatible, as they are for POSIXlt.
>>
>> In the current case, for me the only good reason is backwards
>> compatibility.
>> My personal taste would be to change it and see what happens.
>> I would be willing to clean up after that change within R 'base'
>> and all packages I am coauthoring (quite a few), but of course
>> there are still a thousand more R packages..
>> My strong bet would be that less than 1% would be affected,
>> and my point guess for the percentage affected would be
>> rather in the order of 1/1000.
>>
>> The question is if we (you too!), the R community, are willing to
>> bear the load of cleanup, after such a change which would really
>> *improve* consistency of that small corner of R.
>> For me, as I indicated above, I am willing to bear my share
>> (and actually have got it ready for R-devel)
> Would be great to see this change! Surely the right way to do things is
> that functions that wish to examine the low level structure of S3
> objects should use unclass() before looking at length and elements, so
> there's no reason for a class such as POSIXlt to not provide a
> logical-level length method.
I have now committed such a change to R-devel (only!), revision 50616.
Thank you and Gabor and others for supporting this.
As said here earlier in this thread: We must be ready to see
that this change can break other code that implicitly assumed
the "old" i.e. pre R-devel (2.11.x) behavior.
As I also said earlier, I'm prepared to help package authors to
fix their code accordingly,
but I'd be grateful to be notified *if* problems surface from
this.
Martin Maechler, ETH Zurich
> At a broader level, when I've designed vector/array classes, I've
> wondered what methods I should define, but have been unable to find any
> specification of a set of methods. When one thinks about it, there are
> actually quite a set of strongly-connected methods with quite a lot a
> behaviors to implement, e.g., length, '[' (with logical, numeric &
> character indicies, including 0 and NA possibilities), '[[', 'c', and
> then optionally 'names', and then for multi-dim objects, 'dim',
> 'dimnames', etc. Consequently, last time this discussion on length and
> '[' methods POSIXlt came up, I wrote a function that automatically
> tested behavior of all these methods on a specified class and summarizes
> the behavior. If anyone is interested in such a thing, I'd be happy to
> dig it up and distribute it (I'd attach it to this message, but I'm on
> vacation and don't have access to the compute that I think it's on.)
> -- Tony Plate
>> Martin Maechler, ETH Zurich (and R Core Team)
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list