[Rd] Wrong length of POSIXt vectors (PR#10507)

Mon Dec 17 18:34:50 CET 2007

Jeffrey J. Hallman wrote:
> Duncan Murdoch <murdoch at stats.uwo.ca> writes:
>
>   
>> One reason I don't want to work on this is because the appropriate 
>> action depends on what "length(x)" is intended to mean.  Currently for 
>> POSIXlt objects, it gives the physical length of the underlying basic 
>> type (the list).  This is the same behaviour as we have for matrices, 
>> data frames and every other object without a specific length method, so 
>> it's not outrageous.
>>
>> The proposed change is to have it return the logical length of the 
>> object, which also seems quite reasonable.  I don't think matrices and 
>> data frames have a "logical length", so there would be no contradiction 
>> in those examples.  The thing that worries me is that there are probably 
>> objects in packages where both logical length and physical length make 
>> sense but are different.  I don't have any expectation that length(x) on 
>> those currently is consistent in which type of value it returns.
>>
>> If we were to decide that "length(x)" *always* meant logical length, 
>> then we would have a problem:  matrices and data frames don't have a 
>> logical length, so we shouldn't be getting an answer there.  Changing 
>> length(x) for those is not acceptable.
>>
>> On the other hand, if we decide that "length(x)" *always* means physical 
>> length, we don't need to do anything to the POSIXlt or matrices or data 
>> frames, but there may well be other kinds of objects out there that 
>> violate this rule.
>>
>> We could leave the meaning of length(x) ambiguous.  If you want to know 
>> what it does for a POSIXlt object, you need to read the documentation or 
>> look at the source code.  As a policy, this isn't particularly 
>> appealing, but I could probably live with it if someone else did the 
>> research and showed that current usage is ambiguous.
>>     
>
> Physical length and logical length are, as you say, two different things.  So
> why not two functions?  Keep length() for physical length, as it is now, and
> maybe Length() for logical length.  The latter could be defined as
>
> Length <- function(x, ...) UseMethod("Length")
>
> Length.default <- function(x, ...) length(x)
>
> and then add methods for classes that want something else.
>   
A very reasonable suggestion, but I'd also put this in the "next time we 
design a language" category.

The current system in R seems workable to me, if one knows that 
vector-like classes that have a S3 list-based implementation need to 
have methods defined for 'c', 'length', '[', etc, and that if these 
methods aren't defined, then you'll be operating on the underlying list 
structure.  Where these methods are defined, one can get at the 
underlying structure by unclassing first, and that's OK.  However, 
classes that have some of these methods defined but not others seem to 
me to be needlessly confusing -- it's not like there any great benefit 
that length() always returns the length of the underlying list for 
POSIXlt -- if there was a length() method one could get at the 
underlying length using length(unclass(x)).  It just seems like a design 
oversight that makes using such classes unnecessarily difficult and 
error-prone.

Hence my proposal (in a new thread) for coding & documentation 
guidelines that would that would:
(1) suggest consistency is a good thing
(2) suggent compliance or deviation should be documented
(3) define what consistency was (and here it's not so important to get 
absolutely the right set of consistency definitions as it is to get a 
reasonable set that people agree on.)

-- Tony Plate