[Rd] Subsetting vectors/arrays using factors can be seen as misleading
Laurent Gautier
lgautier at gmail.com
Fri Mar 14 11:24:53 CET 2008
Thanks for your answer.
I understand that this is long established, but I would suspect that the
usage of extracting by names was then less common (I can easily
admit that this is pure speculation on my side, I have no data to support this).
As R evolves, sometimes things happen to be deprecated (a recent example
seems to be ''$" on atomic vectors).
I also understand that the behavior is documented, but I have seen it
causing trouble to someone that was not a complete beginner with R,
and I could only agree with the fact that this is somewhat misleading
after I helped solving the problem.
Suggesting a warning and keep the current behavior is just an idea
(that would not break existing code). Similarly, casting factors into
integers was just a way to illustrate the point. Something that would
not involve unnecessary copies of (potentially large) objects can be
considered (may be using "unclass()" does not involve making a copy
and could be used in place of "as.integer" ?).
2008/3/14, Prof Brian Ripley <ripley at stats.ox.ac.uk>:
> This is long established and documented on the basic help page for '['.
> Further, the convention is widely used in R itself: running 'make check'
> would give a few hundred warnings and then fail. Working around those
> warnings would be inefficient (involving unnecessary copying of large
> objects).
>
> One place where this matters is the advice to use levels(x)[x] as in
> as.character.factor() -- that construction is widespread, perhaps so
> widespread as to make it worthwhile making that an internal operation.
>
>
> On Thu, 13 Mar 2008, Laurent Gautier wrote:
>
> > Dear list,
> >
> > Subsetting vectors/arrays using factors can be seen as misleading, and
> > I was thinking that it could be discouraged (at least by issuing a
> > warning).
> > I could not find whether this was discussed earlier, but I can be
> > pointed to a reference if I missed any.
> >
> > The "extract" operator "[" can take as arguments either vectors of
> > integers or vectors of characters in order to subset a data structure.
> > For example:
> >> x <- seq(1, 5)
> >> names(x) <- letters[1:5]
> >>
> >> x[1]
> > a
> > 1
> >> x["a"]
> > a
> > 1
> >
> > Using a factor caused some confusion to someone here, and I have to
> > admit that it can indeed appear misleading:
> >> f <- factor("a", levels=c("b", "a", "c"))
> >> f
> > [1] a
> > Levels: b a c
> >> x[f] # here the integer is used, rather than the level
> > b
> > 2
> >
> > The dual nature of the factor (vector of integers, with an attached
> > vector of levels), is not always clear to many users, especially since
> > factors are treated differently in other situations.
> > Example:
> >> f == 1
> > [1] FALSE
> >> f == "a" #here the level is used, not the integer
> > [1] TRUE
> >
> > This is making me suggest that indexing using a factor could issue a
> > warning, and the user should explicitly wrap the vector with either
> > "as.integer" or "as.character".
> >
> >
> > L.
> >
> > PS: All examples above were run with
> > platform x86_64-unknown-linux-gnu
> > arch x86_64
> > os linux-gnu
> > system x86_64, linux-gnu
> > status Under development (unstable)
> > major 2
> > minor 7.0
> > year 2008
> > month 03
> > day 12
> > svn rev 44742
> > language R
> > version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742)
> >
>
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
--
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iEYEARECAAYFAkYgwJ4ACgkQB/w/MLoyRDeQlgCeMp8v69/Wy24Q4IaBVhoG1M5R
2h4AoIOTvKbrFpTklRDjV7u8tEOeSQqt
=JPph
-----END PGP SIGNATURE-----
More information about the R-devel
mailing list