[Rd] (PR#8192) [ subscripting sometimes loses names

Sat Jan 31 22:36:45 CET 2009

Christian Brechbühler wrote:

<snip>
>
>>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>>>       
>> [1] 1 2 3
>>
>> but it's not obvious that the result should be named using the row.names
>> and (in particular) whether or why it should differ from .....[[1]] and
>> ....$val. 

this might be a good argument, if not that [,1] returning a vector
rather than a one-column data frame is already inconsistent (with
[,1:2], for example).  if [,1] were not dropping the data.frame class
and were returning a data frame instead, it would be obvious the result
should use row names. 

data.frame(val=1:3,row.names=letters[1:3])[,1,drop=FALSE]

will keep the class and row names, though ?'[' says "drop: For matrices
and arrays.".

it doesn't mean that dropping row names (or dropping dimensions) isn't
useful and handy in specific cases, but this makes it no less
inconsistent. 

>> Given that for most purposes, extracting the relevant names would
>> just be unnecessary red tape, I'd say that we can do without it.
>>     
>
>
> Compare
>
>   
>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>>     
> [1] 1 2 3
>   
>> as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1]
>>     
> a b c
> 1 2 3
>
> X[,1] preserves row names if X is a matrix, and loses them if X is a data
> frame.  To me, this is ugly and inconsistent.
>
> One might argue that having names and dimnames at all is "red tape", and
> wastes memory and computational efficiency -- after all, Fortran arrays had
> no names.  But R chose to drag along the names (sometimes), and it can be
> very helpful to us humans.  Now R should do it consistently.
>   

i support this opinion.  whether to have or not to have row names is a
design decision, and both options may be reasonably argued for and
against.  but lack of consistency is seldom any good;  r consistently
lacks consistency.

vQ