[Rd] Surprising length() of POSIXlt vector (PR#14073)

maechler at stat.math.ethz.ch maechler at stat.math.ethz.ch
Mon Nov 30 14:10:45 CET 2009


>>>>> Tony Plate <tplate at acm.org>
>>>>>     on Sun, 22 Nov 2009 10:21:33 -0600 writes:

    > maechler at stat.math.ethz.ch wrote:
    >>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
    >>>>>>> on Fri, 20 Nov 2009 09:54:34 +0100 writes:
    >>>>>>> 
    >> 
    PD> mark at celos.net wrote:
    >> >> Arrays of POSIXlt dates always return a length of 9.  This
    >> >> is correct (they're really lists of vectors of seconds,
    >> >> hours, and so forth), but other methods disguise them as
    >> >> flat vectors, giving superficially surprising behaviour:
    >> >> 
    >> >> strings <- paste('2009-1-', 1:31, sep='')
    >> >> dates <- strptime(strings, format="%Y-%m-%d")
    >> >> 
    >> >> print(dates)
    >> >> #  [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05"
    >> >> #  [6] "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10"
    >> >> # [11] "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" "2009-01-15"
    >> >> # [16] "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20"
    >> >> # [21] "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25"
    >> >> # [26] "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30"
    >> >> # [31] "2009-01-31"
    >> >> 
    >> >> print(length(dates))
    >> >> # [1] 9
    >> >> 
    >> >> str(dates)
    >> >> # POSIXlt[1:9], format: "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" ...
    >> >> 
    >> >> print(dates[20])
    >> >> # [1] "2009-01-20"
    >> >> 
    >> >> print(length(dates[20]))
    >> >> # [1] 9
    >> >> 
    >> >> I've since realised that POSIXct makes date vectors easier,
    >> >> but could we also have something like:
    >> >> 
    >> >> length.POSIXlt <- function(x) { length(x$sec) }
    >> >> 
    >> >> in datetime.R, to avoid breaking functions (like the
    >> >> str.POSIXt method) which use length() in this way?
    >> 
    >> 
    PD> [You need "wishlist" in the title for this sort of stuff.]
    >> 
    PD> I'd be wary of this. Just the other day we found that identical() broke 
    PD> on some objects because a package had length() redefined as a class 
    PD> method. I.e. the danger is that something wants to use length() with its 
    PD> original low-level interpretation.
    >> 
    >> Yes, of course.
    >> and Romain mentioned  str().  Note that we have needed to define
    >> a "POSIXt" method for str(), partly just *because* of the
    >> current anomaly:
    >> As Tony Plate, e.g., has argued, entirely correctly in my view,
    >> the anomaly is that    length() and "["   are not compatible;
    >> and while I think no R language definition says that they should
    >> be, I still believe that you need very good reasons for them to
    >> be incompatible, as they are for POSIXlt.
    >> 
    >> In the current case, for me the only good reason is backwards
    >> compatibility.
    >> My personal taste would be to change it and see what happens.
    >> I would be willing to clean up after that change within R 'base'
    >> and all packages I am coauthoring (quite a few), but of course
    >> there are still a thousand more R packages..
    >> My strong bet would be that less than 1% would be affected,
    >> and my point guess for the percentage affected would be
    >> rather in the order of  1/1000.
    >> 
    >> The question is if we (you too!), the R community, are willing to
    >> bear the load of cleanup, after such a change which would really
    >> *improve* consistency of that small corner of R.
    >> For me, as I indicated above, I am willing to bear my share
    >> (and actually have got it ready for R-devel)

    > Would be great to see this change!  Surely the right way to do things is 
    > that functions that wish to examine the low level structure of S3 
    > objects should use unclass() before looking at length and elements, so 
    > there's no reason for a class such as POSIXlt to not provide a 
    > logical-level length method.

I have now committed such a change to R-devel (only!), revision 50616.
Thank you and Gabor and others for supporting this.

As said here earlier in this thread:  We must be ready to see
that this change can break other code that implicitly assumed
the "old" i.e.  pre R-devel (2.11.x) behavior.

As I also said earlier, I'm prepared to help package authors to
fix their code accordingly,
but I'd be grateful to be notified *if* problems surface from
this.

Martin Maechler, ETH Zurich


    > At a broader level, when I've designed vector/array classes, I've 
    > wondered what methods I should define, but have been unable to find any 
    > specification of a set of methods.  When one thinks about it, there are 
    > actually quite a set of strongly-connected methods with quite a lot a 
    > behaviors to implement, e.g., length, '[' (with logical, numeric & 
    > character indicies, including 0 and NA possibilities), '[[', 'c', and 
    > then optionally 'names', and then for multi-dim objects, 'dim', 
    > 'dimnames', etc.  Consequently, last time this discussion on length and 
    > '[' methods POSIXlt came up, I wrote a function that automatically 
    > tested behavior of all these methods on a specified class and summarizes 
    > the behavior.  If anyone is interested in such a thing, I'd be happy to 
    > dig it up and distribute it (I'd attach it to this message, but I'm on 
    > vacation and don't have access to the compute that I think it's on.)

    > -- Tony Plate

    >> Martin Maechler, ETH Zurich (and R Core Team)
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 
    >> 

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list