[Rd] grep() and factors

Marc Schwartz (via MN) mschwartz at mn.rr.com
Mon Jun 5 21:02:09 CEST 2006


Hi all,

Based upon an offlist communication this morning, I am somewhat confused
(more than I usually am on most Monday mornings...) about the use of
grep() with factors as the 'x' argument.

The argument guidance in ?grep indicates:

x, text a character vector where matches are sought. Coerced to
        character if possible.

and in the Details section:

Arguments which should be character strings or character vectors are
coerced to character if possible.


The wording of both would seem to reasonably lead to the conclusion that
a factor could be coerced to a character vector by the use of
as.character(FACTOR).

In tracing through the C code in character.c for do_grep(), which in
turn calls coerceVector() in coerce.c, unless I am mis-reading the code
(always possible), I don't see an indication that a factor would be
coerced to a character vector.

Since a factor -> character coercion would seem at face value, the most
logical coercion to take place when using grep(), I am curious if I am
missing something, or if perhaps ?grep needs to be more clear in the
coercions that will or might take place. Perhaps even the consideration
of an error message if a factor is passed as the 'x' argument, if indeed
the coercion would not take place.

Perhaps the easiest example here might be:

# On R Version 2.3.1 (2006-06-01) on FC5

> grep("[a-z]", letters)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26

> grep("[a-z]", factor(letters))
numeric(0)


Thanks for any comments or any virtual rotten tomatoes coming my way at
high speed.  :-)

Marc Schwartz



More information about the R-devel mailing list