[Rd] grep() and factors
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jun 6 18:08:49 CEST 2006
On Tue, 6 Jun 2006, Marc Schwartz (via MN) wrote:
> On Tue, 2006-06-06 at 11:12 +0100, Prof Brian Ripley wrote:
>> On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
>>> Hi all,
>>> Based upon an offlist communication this morning, I am somewhat confused
>>> (more than I usually am on most Monday mornings...) about the use of
>>> grep() with factors as the 'x' argument.
>>> The argument guidance in ?grep indicates:
>>> x, text a character vector where matches are sought. Coerced to
>>> character if possible.
>>> and in the Details section:
>>> Arguments which should be character strings or character vectors are
>>> coerced to character if possible.
>>> The wording of both would seem to reasonably lead to the conclusion that
>>> a factor could be coerced to a character vector by the use of
>> Well, that is not what is meant by the wording, nor what happens: there is
>> no method dispatch so the factor is coerced from an integer vector to a
>> character vector. 'coerced' usually means at low level: where
>> as.character() is involved we tend to say so.
>> As for the comments on what happens if value=TRUE: if the 'x' has been
>> coerced, I would expect the value to be based on the coerced value (and it
>> currently is).
>>> grep("1", factor(letters))
>>  1 10 11 12 13 14 15 16 17 18 19 21
>>> grep("1", factor(letters), value=TRUE)
>>  "1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "21"
>> So whereas I am quite happy to replace the low-level coercion by method
>> dispatch on as.character, I don't think this should be altered (and am
>> pretty sure there is code out there which expects a character vector
> Prof. Ripley,
> Thanks for your reply and clarification.
> I would acknowledge that the coercion of a factor to its numeric values
> would not be immediately intuitive to me (or others who have commented
> on this) within the context of grep(). However, in light of your
> comments and having reviewed the C code, it does make sense.
> Given this behavior, it would seem reasonable to provide a clarification
> in ?grep, perhaps as follows:
> x, text a character vector where matches are sought. Coerced to
> character if possible. See Details for factors.
> Arguments which should be character strings or character vectors are
> coerced to character if possible. In the case of factors, these are
> coerced using as.integer(x). You must explicitly coerce the factor using
> as.character(x) to use these functions on the character vector
I do think we should `replace the low-level coercion by method dispatch on
as.character', and have done so in R-devel (but am still testing
packages). There have been quite a few instances of such low-level
coercion (including for dimnames), and I am currently looking through to
see if there are any others that either should be altered or the
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel