[Rd] grep() and factors
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jun 6 12:12:56 CEST 2006
On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
> Hi all,
>
> Based upon an offlist communication this morning, I am somewhat confused
> (more than I usually am on most Monday mornings...) about the use of
> grep() with factors as the 'x' argument.
>
> The argument guidance in ?grep indicates:
>
> x, text a character vector where matches are sought. Coerced to
> character if possible.
>
> and in the Details section:
>
> Arguments which should be character strings or character vectors are
> coerced to character if possible.
>
>
> The wording of both would seem to reasonably lead to the conclusion that
> a factor could be coerced to a character vector by the use of
> as.character(FACTOR).
Well, that is not what is meant by the wording, nor what happens: there is
no method dispatch so the factor is coerced from an integer vector to a
character vector. 'coerced' usually means at low level: where
as.character() is involved we tend to say so.
As for the comments on what happens if value=TRUE: if the 'x' has been
coerced, I would expect the value to be based on the coerced value (and it
currently is).
> grep("1", factor(letters))
[1] 1 10 11 12 13 14 15 16 17 18 19 21
> grep("1", factor(letters), value=TRUE)
[1] "1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "21"
So whereas I am quite happy to replace the low-level coercion by method
dispatch on as.character, I don't think this should be altered (and am
pretty sure there is code out there which expects a character vector
result).
> In tracing through the C code in character.c for do_grep(), which in
> turn calls coerceVector() in coerce.c, unless I am mis-reading the code
> (always possible), I don't see an indication that a factor would be
> coerced to a character vector.
>
> Since a factor -> character coercion would seem at face value, the most
> logical coercion to take place when using grep(), I am curious if I am
> missing something, or if perhaps ?grep needs to be more clear in the
> coercions that will or might take place. Perhaps even the consideration
> of an error message if a factor is passed as the 'x' argument, if indeed
> the coercion would not take place.
>
> Perhaps the easiest example here might be:
>
> # On R Version 2.3.1 (2006-06-01) on FC5
>
>> grep("[a-z]", letters)
> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
> [23] 23 24 25 26
>
>> grep("[a-z]", factor(letters))
> numeric(0)
>
>
> Thanks for any comments or any virtual rotten tomatoes coming my way at
> high speed. :-)
>
> Marc Schwartz
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list