[Rd] regex to match word boundaries

Gabor Grothendieck ggrothendieck at myway.com
Mon Dec 6 12:21:36 CET 2004


Gabor Grothendieck <ggrothendieck <at> myway.com> writes:

: 
: Can someone verify whether or not this is a bug.
: 
: When I substitute all occurrence of "\\B" with "X"
: R seems to correctly place an X at all non-word boundaries
: (whether or not I specify perl) but "\\b" does not seem to
: act on all complement positions:
: 
: > gsub("\\b", "X", "abc def") # nothing done
: [1] "abc def"
: > gsub("\\B", "X", "abc def") # as expected, I think
: [1] "aXbXc dXeXf"
: > gsub("\\b", "X", "abc def", perl = TRUE) # not as expected
: [1] "abc Xdef"
: > gsub("\\B", "X", "abc def", perl = TRUE)  # as expected
: [1] "aXbXc dXeXf"
: > R.version.string  # Windows 2000
: [1] "R version 2.0.1, 2004-11-27"

I have found another possibly related problem.  In the above 
\\B always worked as expected but not \\b.  I have an
example where \\B does not work as expected either.  Note
that in the first example below all the letters which are not
first in the word get prefaced with X as expected but in the second
case only alternate letters which are not first in the
word get replaced with X whereas one would have exptected
that all letters not first in the word get replaced with X.

R> gsub("\\B", "X", "The Quick Brown Fox") # works as expected
[1] "TXhXe QXuXiXcXk BXrXoXwXn FXoXx"

R> gsub("\\B.", "X", "The Quick Brown Fox", perl = TRUE) # problem
[1] "TXe QXiXk BXoXn FXx"

R> R.version.string # Windows XP
[1] "R version 2.0.1, 2004-11-04"


By the way, do I have to submit a second bug report for this or is
it possible to add this onto the previous one as a comment?



More information about the R-devel mailing list