[Rd] pb in regular expression with the character "-" (PR#9437)
maechler at stat.math.ethz.ch
maechler at stat.math.ethz.ch
Thu Jan 4 22:18:08 CET 2007
>>>>> "FanX" == Xiao Gang Fan <xiao.gang.fan1 at libertysurf.fr>
>>>>> on Thu, 04 Jan 2007 21:52:07 +0100 writes:
FanX> Let me detail a bit my bug report: the two commands
FanX> ("expected" vs "strange") should return the same
FanX> result, the objective of the commands is to test the
FanX> presence of several characters, '-'included.
FanX> The order in which we specify the different characters
FanX> must not be an issue, i.e., to test the presence of
FanX> several characters, including say char_1, the regular
FanX> expressions [char_1|char_2|char_3] and
FanX> [char_2|char_1|char_3] should play the same
FanX> role. Other softwares work just like this.
FanX> What's reported is that R actually returns different
FanX> result for the character "-" (\- in a RE) regarding
FanX> it's position in the regular expression, and the
FanX> "perl" option would not be relevant.
Fan, it seems haven't understood what Brian Ripley explained to
you: Let me try to spell it out for you:
"\-" is *NOT* what you seem still to be thinking it is:
> "\-"
[1] "-"
> identical("\-", "-")
[1] TRUE
>
This is all in the R-FAQ entry
>>> 7.37 Why does backslash behave strangely inside strings?
========================================================
and in several other places, and yes,
please do read the R FAQ and maybe more documentation
about R and "bug reporting" before your next bug report.
Consider my guesstimate:
For 99% of all R users, the amount of time they need working
pretty intensely with R before they find a bug in it,
is nowadays more than three years, and maybe even much more
-- such as their lifetime :-)
Martin Maechler, ETH Zurich
FanX> Prof Brian Ripley wrote:
>> Why do you think this is a bug in R? You have not told
>> us what you expected, but the character range |-|
>> contains only | . Not agreeing with your expectations
>> (unstated or otherwise) is not a bug in R.
>>
>> \- is the same as -, and - is special in character
>> classes. (If it is first or last it is treated
>> literally.) And | is not a metacharacter inside a
>> character class. Also,
>>
>>> grep("[d\\-c]", c("a-a","b"))
>> [1] 1 2
>>
>>> grep("[d\\-c]", c("a-a","b"), perl=TRUE)
>> [1] 1
>>
>> shows that escaping - works only in perl (which you will
>> find from the background references mentioned, e.g.
>>
>> The interpretation of an ordinary character preceded by a
>> backslash ('\') is undefined.
>>
>> .)
>>
>> This is all carefully documented in ?regexp, e.g.
>>
>> Patterns are described here as they would be printed by
>> 'cat': do remember that backslashes need to be doubled in
>> entering R character strings from the keyboard.
>>
>>
>> This is not the first time you have wasted our resources
>> with false bug reports, so please show more respect for
>> the R developers' time. You were also explicitly asked
>> not to report on obselete versions of R.
>>
>> On Wed, 3 Jan 2007, xiao.gang.fan1 at libertysurf.fr wrote:
>>
>>> Full_Name: FAN Version: 2.4.0 OS: Windows Submission
>>> from: (NULL) (159.50.101.9)
>>>
>>>
>>> These are expected:
>>>
>>>> grep("[\-|c]", c("a-a","b"))
>>> [1] 1
>>>
>>>> gsub("[\-|c]", "&", c("a-a","b"))
>>> [1] "a&a" "b"
>>>
>>> but these are strange:
>>>
>>>> grep("[d|\-|c]", c("a-a","b"))
>>> integer(0)
>>>
>>>> gsub("[d|\-|c]", "&", c("a-a","b"))
>>> [1] "a-a" "b"
>>>
>>> Thanks
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
FanX> ______________________________________________
FanX> R-devel at r-project.org mailing list
FanX> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list