[Rd] (PR#8192) [ subscripting sometimes loses names
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Sat Jan 31 22:36:45 CET 2009
Christian Brechbühler wrote:
<snip>
>
>>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>>>
>> [1] 1 2 3
>>
>> but it's not obvious that the result should be named using the row.names
>> and (in particular) whether or why it should differ from .....[[1]] and
>> ....$val.
this might be a good argument, if not that [,1] returning a vector
rather than a one-column data frame is already inconsistent (with
[,1:2], for example). if [,1] were not dropping the data.frame class
and were returning a data frame instead, it would be obvious the result
should use row names.
data.frame(val=1:3,row.names=letters[1:3])[,1,drop=FALSE]
will keep the class and row names, though ?'[' says "drop: For matrices
and arrays.".
it doesn't mean that dropping row names (or dropping dimensions) isn't
useful and handy in specific cases, but this makes it no less
inconsistent.
>> Given that for most purposes, extracting the relevant names would
>> just be unnecessary red tape, I'd say that we can do without it.
>>
>
>
> Compare
>
>
>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>>
> [1] 1 2 3
>
>> as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1]
>>
> a b c
> 1 2 3
>
> X[,1] preserves row names if X is a matrix, and loses them if X is a data
> frame. To me, this is ugly and inconsistent.
>
> One might argue that having names and dimnames at all is "red tape", and
> wastes memory and computational efficiency -- after all, Fortran arrays had
> no names. But R chose to drag along the names (sometimes), and it can be
> very helpful to us humans. Now R should do it consistently.
>
i support this opinion. whether to have or not to have row names is a
design decision, and both options may be reasonably argued for and
against. but lack of consistency is seldom any good; r consistently
lacks consistency.
vQ
More information about the R-devel
mailing list