[Rd] pb in regular expression with the character "-" (PR#9437)
ripley at stats.ox.ac.uk
ripley at stats.ox.ac.uk
Fri Jan 5 00:06:00 CET 2007
Both Solaris 8 grep and GNU grep 2.5.1 give
gannet% cat > foo.txt
a-a
b
gannet% egrep '[d|-|c]' foo.txt
gannet% egrep '[-|c]' foo.txt
a-a
agreeing exactly with R (and the POSIX standard) and contradicting 'Fan'.
On Thu, 4 Jan 2007, Fan wrote:
> Let me detail a bit my bug report:
>
> the two commands ("expected" vs "strange") should return the
> same result, the objective of the commands is to test the presence
> of several characters, '-'included.
>
> The order in which we specify the different characters must not be
> an issue, i.e., to test the presence of several characters, including
> say char_1, the regular expressions [char_1|char_2|char_3] and
> [char_2|char_1|char_3] should play the same role. Other softwares
> work just like this.
>
> What's reported is that R actually returns different result for the
> character "-" (\- in a RE) regarding it's position in the regular
> expression, and the "perl" option would not be relevant.
As described in the relevant international standard and R's own
documentation.
> Prof Brian Ripley wrote:
>> Why do you think this is a bug in R? You have not told us what you
>> expected, but the character range |-| contains only | . Not agreeing with
>> your expectations (unstated or otherwise) is not a bug in R.
>>
>> \- is the same as -, and - is special in character classes. (If it is
>> first or last it is treated literally.) And | is not a metacharacter
>> inside a character class. Also,
>>
>>> grep("[d\\-c]", c("a-a","b"))
>>
>> [1] 1 2
>>
>>> grep("[d\\-c]", c("a-a","b"), perl=TRUE)
>>
>> [1] 1
>>
>> shows that escaping - works only in perl (which you will find from the
>> background references mentioned, e.g.
>>
>> The interpretation of an ordinary character preceded by a backslash
>> ('\') is undefined.
>>
>> .)
>>
>> This is all carefully documented in ?regexp, e.g.
>>
>> Patterns are described here as they would be printed by 'cat': do
>> remember that backslashes need to be doubled in entering R
>> character strings from the keyboard.
>>
>>
>> This is not the first time you have wasted our resources with false bug
>> reports, so please show more respect for the R developers' time.
>> You were also explicitly asked not to report on obselete versions of R.
>>
>> On Wed, 3 Jan 2007, xiao.gang.fan1 at libertysurf.fr wrote:
>>
>>> Full_Name: FAN
>>> Version: 2.4.0
>>> OS: Windows
>>> Submission from: (NULL) (159.50.101.9)
>>>
>>>
>>> These are expected:
>>>
>>>> grep("[\-|c]", c("a-a","b"))
>>>
>>> [1] 1
>>>
>>>> gsub("[\-|c]", "&", c("a-a","b"))
>>>
>>> [1] "a&a" "b"
>>>
>>> but these are strange:
>>>
>>>> grep("[d|\-|c]", c("a-a","b"))
>>>
>>> integer(0)
>>>
>>>> gsub("[d|\-|c]", "&", c("a-a","b"))
>>>
>>> [1] "a-a" "b"
>>>
>>> Thanks
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list