[Rd] pb in regular expression with the character "-" (PR#9437)

maechler at stat.math.ethz.ch maechler at stat.math.ethz.ch
Thu Jan 4 22:18:08 CET 2007

>>>>> "FanX" == Xiao Gang Fan <xiao.gang.fan1 at libertysurf.fr>
>>>>>     on Thu, 04 Jan 2007 21:52:07 +0100 writes:

    FanX> Let me detail a bit my bug report: the two commands
    FanX> ("expected" vs "strange") should return the same
    FanX> result, the objective of the commands is to test the
    FanX> presence of several characters, '-'included.

    FanX> The order in which we specify the different characters
    FanX> must not be an issue, i.e., to test the presence of
    FanX> several characters, including say char_1, the regular
    FanX> expressions [char_1|char_2|char_3] and
    FanX> [char_2|char_1|char_3] should play the same
    FanX> role. Other softwares work just like this.

    FanX> What's reported is that R actually returns different
    FanX> result for the character "-" (\- in a RE) regarding
    FanX> it's position in the regular expression, and the
    FanX> "perl" option would not be relevant.

Fan, it seems haven't understood what Brian Ripley explained to
you:  Let me try to spell it out for you:

"\-" is *NOT* what you seem still to be thinking it is:

  > "\-"
  [1] "-"
  > identical("\-", "-")
  [1] TRUE

This is all in the R-FAQ entry
    >>> 7.37 Why does backslash behave strangely inside strings?

and in several other places, and yes,
please do read the R FAQ and maybe more documentation
about R and "bug reporting" before your next bug report.

Consider my guesstimate:
For 99% of all R users, the amount of time they need working
pretty intensely with R before they find a bug in it, 
is nowadays more than three years, and maybe even much more
-- such as their lifetime :-)

Martin Maechler, ETH Zurich

    FanX> Prof Brian Ripley wrote:
    >> Why do you think this is a bug in R?  You have not told
    >> us what you expected, but the character range |-|
    >> contains only | .  Not agreeing with your expectations
    >> (unstated or otherwise) is not a bug in R.
    >> \- is the same as -, and - is special in character
    >> classes.  (If it is first or last it is treated
    >> literally.)  And | is not a metacharacter inside a
    >> character class.  Also,
    >>> grep("[d\\-c]", c("a-a","b"))
    >>  [1] 1 2
    >>> grep("[d\\-c]", c("a-a","b"), perl=TRUE)
    >>  [1] 1
    >> shows that escaping - works only in perl (which you will
    >> find from the background references mentioned, e.g.
    >> The interpretation of an ordinary character preceded by a
    >> backslash ('\') is undefined.
    >> .)
    >> This is all carefully documented in ?regexp, e.g.
    >> Patterns are described here as they would be printed by
    >> 'cat': do remember that backslashes need to be doubled in
    >> entering R character strings from the keyboard.
    >> This is not the first time you have wasted our resources
    >> with false bug reports, so please show more respect for
    >> the R developers' time.  You were also explicitly asked
    >> not to report on obselete versions of R.
    >> On Wed, 3 Jan 2007, xiao.gang.fan1 at libertysurf.fr wrote:
    >>> Full_Name: FAN Version: 2.4.0 OS: Windows Submission
    >>> from: (NULL) (
    >>> These are expected:
    >>>> grep("[\-|c]", c("a-a","b"))
    >>>  [1] 1
    >>>> gsub("[\-|c]", "&", c("a-a","b"))
    >>>  [1] "a&a" "b"
    >>> but these are strange:
    >>>> grep("[d|\-|c]", c("a-a","b"))
    >>>  integer(0)
    >>>> gsub("[d|\-|c]", "&", c("a-a","b"))
    >>>  [1] "a-a" "b"
    >>> Thanks
    >>> ______________________________________________
    >>> R-devel at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel

    FanX> ______________________________________________
    FanX> R-devel at r-project.org mailing list
    FanX> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list