[Rd] Subsetting vectors/arrays using factors can be seen as misleading
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Mar 14 10:55:59 CET 2008
This is long established and documented on the basic help page for '['.
Further, the convention is widely used in R itself: running 'make check'
would give a few hundred warnings and then fail. Working around those
warnings would be inefficient (involving unnecessary copying of large
objects).
One place where this matters is the advice to use levels(x)[x] as in
as.character.factor() -- that construction is widespread, perhaps so
widespread as to make it worthwhile making that an internal operation.
On Thu, 13 Mar 2008, Laurent Gautier wrote:
> Dear list,
>
> Subsetting vectors/arrays using factors can be seen as misleading, and
> I was thinking that it could be discouraged (at least by issuing a
> warning).
> I could not find whether this was discussed earlier, but I can be
> pointed to a reference if I missed any.
>
> The "extract" operator "[" can take as arguments either vectors of
> integers or vectors of characters in order to subset a data structure.
> For example:
>> x <- seq(1, 5)
>> names(x) <- letters[1:5]
>>
>> x[1]
> a
> 1
>> x["a"]
> a
> 1
>
> Using a factor caused some confusion to someone here, and I have to
> admit that it can indeed appear misleading:
>> f <- factor("a", levels=c("b", "a", "c"))
>> f
> [1] a
> Levels: b a c
>> x[f] # here the integer is used, rather than the level
> b
> 2
>
> The dual nature of the factor (vector of integers, with an attached
> vector of levels), is not always clear to many users, especially since
> factors are treated differently in other situations.
> Example:
>> f == 1
> [1] FALSE
>> f == "a" #here the level is used, not the integer
> [1] TRUE
>
> This is making me suggest that indexing using a factor could issue a
> warning, and the user should explicitly wrap the vector with either
> "as.integer" or "as.character".
>
>
> L.
>
> PS: All examples above were run with
> platform x86_64-unknown-linux-gnu
> arch x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status Under development (unstable)
> major 2
> minor 7.0
> year 2008
> month 03
> day 12
> svn rev 44742
> language R
> version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742)
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list