[Rd] grep() and factors

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jun 6 18:08:49 CEST 2006


On Tue, 6 Jun 2006, Marc Schwartz (via MN) wrote:

> On Tue, 2006-06-06 at 11:12 +0100, Prof Brian Ripley wrote:
>> On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
>>
>>> Hi all,
>>>
>>> Based upon an offlist communication this morning, I am somewhat confused
>>> (more than I usually am on most Monday mornings...) about the use of
>>> grep() with factors as the 'x' argument.
>>>
>>> The argument guidance in ?grep indicates:
>>>
>>> x, text a character vector where matches are sought. Coerced to
>>>        character if possible.
>>>
>>> and in the Details section:
>>>
>>> Arguments which should be character strings or character vectors are
>>> coerced to character if possible.
>>>
>>>
>>> The wording of both would seem to reasonably lead to the conclusion that
>>> a factor could be coerced to a character vector by the use of
>>> as.character(FACTOR).
>>
>> Well, that is not what is meant by the wording, nor what happens: there is
>> no method dispatch so the factor is coerced from an integer vector to a
>> character vector.  'coerced' usually means at low level: where
>> as.character() is involved we tend to say so.
>>
>> As for the comments on what happens if value=TRUE: if the 'x' has been
>> coerced, I would expect the value to be based on the coerced value (and it
>> currently is).
>>
>>> grep("1", factor(letters))
>>   [1]  1 10 11 12 13 14 15 16 17 18 19 21
>>> grep("1", factor(letters), value=TRUE)
>>   [1] "1"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "21"
>>
>> So whereas I am quite happy to replace the low-level coercion by method
>> dispatch on as.character, I don't think this should be altered (and am
>> pretty sure there is code out there which expects a character vector
>> result).
>
> Prof. Ripley,
>
> Thanks for your reply and clarification.
>
> I would acknowledge that the coercion of a factor to its numeric values
> would not be immediately intuitive to me (or others who have commented
> on this) within the context of grep(). However, in light of your
> comments and having reviewed the C code, it does make sense.
>
> Given this behavior, it would seem reasonable to provide a clarification
> in ?grep, perhaps as follows:
>
> Arguments
>
> x, text a character vector where matches are sought. Coerced to
> character if possible. See Details for factors.
>
>
> Details
>
> Arguments which should be character strings or character vectors are
> coerced to character if possible. In the case of factors, these are
> coerced using as.integer(x). You must explicitly coerce the factor using
> as.character(x) to use these functions on the character vector
> equivalent.

I do think we should `replace the low-level coercion by method dispatch on 
as.character', and have done so in R-devel (but am still testing 
packages).  There have been quite a few instances of such low-level 
coercion (including for dimnames), and I am currently looking through to 
see if there are any others that either should be altered or the 
documentation clarified.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list