[Rd] regex to match word boundaries
Martin Maechler
maechler at stat.math.ethz.ch
Thu Dec 2 08:49:02 CET 2004
>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at myway.com>
>>>>> on Wed, 1 Dec 2004 21:05:59 -0500 (EST) writes:
Gabor> Can someone verify whether or not this is a bug.
Gabor> When I substitute all occurrence of "\\B" with "X" R
Gabor> seems to correctly place an X at all non-word
Gabor> boundaries (whether or not I specify perl) but "\\b"
Gabor> does not seem to act on all complement positions:
>> gsub("\\b", "X", "abc def") # nothing done
Gabor> [1] "abc def"
>> gsub("\\B", "X", "abc def") # as expected, I think
Gabor> [1] "aXbXc dXeXf"
>> gsub("\\b", "X", "abc def", perl = TRUE) # not as
>> expected
Gabor> [1] "abc Xdef"
>> gsub("\\B", "X", "abc def", perl = TRUE) # as expected
Gabor> [1] "aXbXc dXeXf"
>> R.version.string # Windows 2000
Gabor> [1] "R version 2.0.1, 2004-11-27"
I agree this looks "unfortunate".
Just to confirm:
1) I get the same on a Linux version
2) the real perl does behave differently and as
you (and I) would have expected:
$ echo 'abc def'| perl -pe 's/\b/X/g'
XabcX XdefX
$ echo 'abc def'| perl -pe 's/\B/X/g'
aXbXc dXeXf
Also, from what I see, "\b" should behave the same independently
of perl = TRUE or FALSE.
--
Martin
More information about the R-devel
mailing list