[Rd] Subsetting vectors/arrays using factors can be seen as misleading
Laurent Gautier
lgautier at gmail.com
Wed Mar 12 22:06:38 CET 2008
Dear list,
Subsetting vectors/arrays using factors can be seen as misleading, and
I was thinking that it could be discouraged (at least by issuing a
warning).
I could not find whether this was discussed earlier, but I can be
pointed to a reference if I missed any.
The "extract" operator "[" can take as arguments either vectors of
integers or vectors of characters in order to subset a data structure.
For example:
> x <- seq(1, 5)
> names(x) <- letters[1:5]
>
> x[1]
a
1
> x["a"]
a
1
Using a factor caused some confusion to someone here, and I have to
admit that it can indeed appear misleading:
> f <- factor("a", levels=c("b", "a", "c"))
> f
[1] a
Levels: b a c
> x[f] # here the integer is used, rather than the level
b
2
The dual nature of the factor (vector of integers, with an attached
vector of levels), is not always clear to many users, especially since
factors are treated differently in other situations.
Example:
> f == 1
[1] FALSE
> f == "a" #here the level is used, not the integer
[1] TRUE
This is making me suggest that indexing using a factor could issue a
warning, and the user should explicitly wrap the vector with either
"as.integer" or "as.character".
L.
PS: All examples above were run with
platform x86_64-unknown-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status Under development (unstable)
major 2
minor 7.0
year 2008
month 03
day 12
svn rev 44742
language R
version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742)
More information about the R-devel
mailing list