[Rd] (PR#8192) [ subscripting sometimes loses names

Sun Feb 1 00:33:11 CET 2009

On 31/01/2009 3:26 PM, Christian Brechbühler wrote:
> On Sat, Jan 31, 2009 at 10:13 AM, Peter Dalgaard
> <p.dalgaard at biostat.ku.dk>wrote:
> 
>> Duncan Murdoch wrote:
>>
>>> On 31/01/2009 7:31 AM, Andrew Piskorski wrote:
>>>
>>>> On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:
>>>>
>>>>> Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling
>>>>> [.data.frame
>>>>>
>>>>  ever tried drop=FALSE ?
>>>> Simon, no, the drop=FALSE argument has nothing to do with what
>>>> Christian was talking about.  The kind of thing he meant is PR# 8192,
>>>> "Subject: [ subscripting sometimes loses names":
>>>>
>>>>  http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192
>>>>
>>> In that bug report you were asked to provide simple examples, and you
>>> didn't.
>>> ...
>>> I just tracked this one down, and can put together this simple example:
>>>
>>>  > (1:3)["no"]
>>> [1] NA
>>>
>>> where I think you would want the name "no" attached to the output.
>> No, it has nothing to do with indexing by name.  It's about preserving
> existing names when subsetting.

I think you misread my message.

> 
> And the other two cases where you list "BAD" behaviour?  I didn't track them
>>> down.
>>>
>> I did, and they boil down to variations of
>>
>>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>> [1] 1 2 3
>>
>> but it's not obvious that the result should be named using the row.names
>> and (in particular) whether or why it should differ from .....[[1]] and
>> ....$val. Given that for most purposes, extracting the relevant names would
>> just be unnecessary red tape, I'd say that we can do without it.
> 
> 
> Compare
> 
>> data.frame(val=1:3,row.names=letters[1:3])[,1]
> [1] 1 2 3
>> as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1]
> a b c
> 1 2 3
> 
> X[,1] preserves row names if X is a matrix, and loses them if X is a data
> frame.  To me, this is ugly and inconsistent.
> 
> One might argue that having names and dimnames at all is "red tape", and
> wastes memory and computational efficiency -- after all, Fortran arrays had
> no names.  But R chose to drag along the names (sometimes), and it can be
> very helpful to us humans.  Now R should do it consistently.

In one case you're working with a matrix, and in the other, a dataframe. 
  So perfect consistency is impossible:  matrices and dataframes are not 
the same.  So it's a matter of deciding how much consistency is worth 
pursuing.  Now, it seems nobody thinks this is worth pursuing:  so it 
won't get changed.

To get it changed, you should make the change, then investigate what 
would break the change were adopted, and what would become slower, etc. 
  Or convince someone else to do that.  But the fact that you think it's 
ugly is probably not convincing.

Duncan Murdoch