[Rd] grep() and factors
Marc Schwartz (via MN)
mschwartz at mn.rr.com
Mon Jun 5 21:02:09 CEST 2006
Hi all,
Based upon an offlist communication this morning, I am somewhat confused
(more than I usually am on most Monday mornings...) about the use of
grep() with factors as the 'x' argument.
The argument guidance in ?grep indicates:
x, text a character vector where matches are sought. Coerced to
character if possible.
and in the Details section:
Arguments which should be character strings or character vectors are
coerced to character if possible.
The wording of both would seem to reasonably lead to the conclusion that
a factor could be coerced to a character vector by the use of
as.character(FACTOR).
In tracing through the C code in character.c for do_grep(), which in
turn calls coerceVector() in coerce.c, unless I am mis-reading the code
(always possible), I don't see an indication that a factor would be
coerced to a character vector.
Since a factor -> character coercion would seem at face value, the most
logical coercion to take place when using grep(), I am curious if I am
missing something, or if perhaps ?grep needs to be more clear in the
coercions that will or might take place. Perhaps even the consideration
of an error message if a factor is passed as the 'x' argument, if indeed
the coercion would not take place.
Perhaps the easiest example here might be:
# On R Version 2.3.1 (2006-06-01) on FC5
> grep("[a-z]", letters)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26
> grep("[a-z]", factor(letters))
numeric(0)
Thanks for any comments or any virtual rotten tomatoes coming my way at
high speed. :-)
Marc Schwartz
More information about the R-devel
mailing list