[Rd] grep() and factors

Gabor Grothendieck ggrothendieck at gmail.com
Tue Jun 6 04:02:39 CEST 2006


On 6/5/06, Bill Dunlap <bill at insightful.com> wrote:
> On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
>
> > > > > grep("[a-z]", factor(letters))
> > > > numeric(0)
> > >
> > > I was recently surprised by this also.  In addition, if
> > > R's grep did support factors in this way, what sort of
> > > object (factor or character) should it return when value=T?
> > > I recently changed Splus's grep to return a character vector in
> > > that case.
> > >
> > >    Splus> grep("[def]", letters[26:1])
> > >    [1] 21 22 23
> > >    Splus>  grep("[def]", factor(letters[26:1], levels=letters[26:1]))
> > >    [1] 21 22 23
> > >    Splus> grep("[def]", letters[26:1], value=T)
> > >    [1] "f" "e" "d"
> > >    Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T)
> > >    [1] "f" "e" "d"
> > >    Splus> class(.Last.value)
> > >    [1] "character"
> > >
> > > R does this when grepping an integer vector.
> > >    R> grep("1", 0:11, value=T)
> > >    [1] "1"  "10" "11"
> > > help(grep) says it returns "the matching elements themselves", but
> > > doesn't say if "themselves" means before or after the conversion to
> > > character.
> >
> > Bill,
> >
> > My first inclination for the return value when used on a factor would be
> > the indexed factor elements where grep() would otherwise simply return
> > the indices. This would also maintain the factor levels from the
> > original source factor since "[".factor would normally retain these when
> > drop = FALSE.
>
> That would be my first inclination also.  I would have expected the output of
>   grep(pattern, text, value=TRUE)
> to be identical to that of
>   text[grep(pattern, text, value=FALSE)]
> no matter what class text has.
>
> No end users have seen this in Splus so we can change it to anything,
> but we want to keep it the same as R's.
>
> > I could be convinced either way. The concern of course being that (given
> > the offlist replies I have received today) even experienced users are
> > getting bitten by the current behavior versus their intuitive
> > expectations, which are at least loosely supported by the documentation.
> >

I would have expected

If non-character text arguments are accepted I would have expected
that they be coerced to character so that
grep(pattern, text, ...) would return the same result as
grep(pattern, as.character(text), ...)



More information about the R-devel mailing list