[Rd] Wrong length of POSIXt vectors (PR#10507)

Tony Plate tplate at acm.org
Fri Dec 14 21:58:30 CET 2007


Duncan Murdoch wrote:
> On 12/13/2007 1:59 PM, Tony Plate wrote:
>> Duncan Murdoch wrote:
>>> On 12/11/2007 6:20 AM, simecek at gmail.com wrote:
>>>> Full_Name: Petr Simecek
>>>> Version: 2.5.1, 2.6.1
>>>> OS: Windows XP
>>>> Submission from: (NULL) (195.113.231.2)
>>>>
>>>>
>>>> Several times I have experienced that a length of a POSIXt vector 
>>>> has not been
>>>> computed right.
>>>>
>>>> Example:
>>>>
>>>> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
>>>> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
>>>> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), 
>>>> mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon 
>>>> = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
>>>> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = 
>>>> c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
>>>> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
>>>> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", 
>>>> "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>>>> ), class = c("POSIXt", "POSIXlt"))
>>>>
>>>> print(tv)
>>>> # print 11 time points (right)
>>>>
>>>> length(tv)
>>>> # returns 9 (wrong)
>>>
>>> tv is a list of length 9.  The answer is right, your expectation is 
>>> wrong.
>>>> I have tried that on several computers with/without switching to 
>>>> English
>>>> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a 
>>>> help pages but I
>>>> cannot imagine how that could be OK.
>>>
>>> See this in ?POSIXt:
>>>
>>> Class '"POSIXlt"' is a named list of vectors...
>>>
>>> You could define your own length measurement as
>>>
>>> length.POSIXlt <- function(x) length(x$sec)
>>>
>>> and you'll get the answer you expect, but be aware that length.XXX 
>>> methods are quite rare, and you may surprise some of your users.
>>>
>>
>> On the other hand, isn't the fact that length() currently always 
>> returns 9 for POSIXlt objects likely to be a surprise to many users 
>> of POSIXlt?
>>
>> The back of "The New S Language" says "Easy-to-use facilities allow 
>> you to organize, store and retrieve all sorts of data. ... S 
>> functions and data organization make applications easy to write."
>>
>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many 
>> other vector-manipulation methods - see methods(class="POSIXlt")).  
>> Hence, from the point of view of intending to supply "easy-to-use 
>> facilities ... [for] all sorts of data", isn't it a little 
>> incongruous that length() is not also provided -- as 3 functions (any 
>> others?) comprise a core set of vector-manipulation functions?
>>
>> Would it make sense to have an informal prescription (e.g., in 
>> R-exts) that a class that implements a vector-like object and 
>> provides at least of one of functions 'c', '[' and 'length' should 
>> provide all three?  It would also be easy to describe a test-suite 
>> that should be included in the 'test' directory of a package 
>> implementing such a class, that had some tests of the basic 
>> vector-manipulation functionality, such as:
>>
>>  > # at this point, x0, x1, x3, & x10 should exist, as vectors of the
>>  > # class being tested, of length 0, 1, 3, and 10, and they should
>>  > # contain no duplicate elements
>>  > length(x0)
>> [1] 1
>>  > length(c(x0, x1))
>> [1] 2
>>  > length(c(x1,x10))
>> [1] 11
>>  > all(x3 == x3[seq(len=length(x3))])
>> [1] TRUE
>>  > all(x3 == c(x3[1], x3[2], x3[3]))
>> [1] TRUE
>>  > length(c(x3[2], x10[5:7]))
>> [1] 4
>>  >
>>
>> It would also be possible to describe a larger set of vector 
>> manipulation functions that should be implemented together, including 
>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', 
>> head, tail ... (many of which are provided for POSIXlt).
>>
>> Or is there some good reason that length() cannot be provided (while 
>> 'c' and '[' can) for some vector-like classes such as "POSIXlt"?
>
> What you say sounds good in general, but the devil is in the details. 
> Changing the meaning of length(x) for some objects has fairly 
> widespread effects.  Are they all positive?  I don't know.
>
> Adding a prescription like the one you suggest would be good if it's 
> easy to implement, but bad if it's already widely violated.  How many 
> base or CRAN or Bioconductor packages violate it currently?   Do the 
> ones that provide all 3 methods do so in a consistent way, i.e. does 
> "length(x)" mean the same thing in all of them?
I'm not sure doing something like this would be so bad even if it is 
already widely violated.  R has evolved significantly over time, and 
many rough edges have been cleaned up, sometimes in ways that were not 
backward compatible.  This is a great thing & my thanks go to the people 
working on R.

If some base or CRAN or Bioconductor packages currently don't implement 
vector operations consistently, wouldn't it be good to know that?  
Wouldn't it be useful to have an automatic way of determining whether a 
particular vector-like class is consistent with generally agreed set of 
principles for how basic vector operations should work -- things like 
length(x)+length(y)==length(c(x,y))?  This could help developers check, 
document & improve their code, and it could help users understand how to 
use a class, and to evaluate the software quality of a class 
implementation and whether or not it provides the functionality they need.
> I agree that the current state is less than perfect, but making it 
> better would really be a lot of work.  I suspect there are better ways 
> to spend my time, so I'm not going to volunteer to do it.  I'm not 
> even going to invite someone else to do it, or offer to review your 
> work if you volunteer.  I think this falls into the class of "next 
> time we write a language, let's handle this better" problems.
Thanks very much for the thoughtful (and honest) feedback!  I suspect 
that the current state could be improved with just a little work, and 
without forcing anyone to do any work they don't want to do.  I'll think 
about this more and try to come back with a better & more concrete 
suggestion.

-- Tony Plate
>
> Duncan Murdoch
>



More information about the R-devel mailing list