[R] Regex for ^ (the caret symbol)?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Mon Jan 21 21:20:20 CET 2013


Apparently Extended RegExp syntax eliminated the "^-is-an-ordinary-character-except-for-two-uses" meaning that I am familiar with from the Basic RegExp usage, since GNU grep with the -e option also refuses to match the carat unless it is escaped. The TRE library treats BRE as obsolete, so we only get ERE and Perl regexes in R. So I guess it isn't a bug, but is rather a "feature".
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Duncan Murdoch <murdoch.duncan at gmail.com> wrote:

>On 13-01-21 1:05 PM, Jeff Newmiller wrote:
>> So what is the special behavior of the ^ symbol when not at the
>beginning of the string that occurs when it is not escaped?
>
>I think it retains its meaning as an assertion that it occurs at the 
>beginning of the line, and so a pattern like "a^b" could never match 
>anything.  It's not very useful in this context, but I expect it's 
>easier to implement in the case of complicated patterns, where some 
>paths through the pattern put it at the beginning and others don't,
>e.g.
>
>(a|)^b
>
>has two possible patterns:  a^b and ^b.
>
>Duncan Murdoch
>
>>
>---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
>Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>Go...
>>                                        Live:   OO#.. Dead: OO#.. 
>Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>>
>---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>>
>> Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>
>>> On 13-01-21 11:48 AM, Jeff Newmiller wrote:
>>>> I am not sure I understand what worked perfectly, since it is my
>>> understanding that ^ is only special at the beginning of the regex
>(to
>>> anchor the pattern at the beginning of the target string) or as the
>>> first character of a character set (to indicate exclusion of the
>listed
>>> characters). In any other position the caret should behave like an
>>> ordinary character. That is, your original pattern should have
>worked
>>> as-is. This is supported by the help page documentation for regex in
>>> the paragraph below the definition of [:xdigit:]. I think this is a
>bug
>>> in R.
>>>
>>> It's a documentation error rather than a bug.  The ^ character is
>>> special anywhere in the extended RE syntax defined by the TRE
>library
>>> or the Perl-compatible library that we use.  This is inconsistent
>with
>>> the POSIX standard, which might be what you were thinking of.
>>>
>>> Duncan Murdoch
>>>
>>>
>>>
>>>>
>>>
>---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....  Go
>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#. 
>Live
>>> Go...
>>>>                                         Live:   OO#.. Dead: OO#..
>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#. 
>with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>> rocks...1k
>>>>
>>>
>---------------------------------------------------------------------------
>>>> Sent from my phone. Please excuse my brevity.
>>>>
>>>> mtb954 at gmail.com wrote:
>>>>
>>>>> Hi Tsjerk, many thanks...that worked perfectly!
>>>>>
>>>>> Mark Na
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 21, 2013 at 9:36 AM, Tsjerk Wassenaar
>>> <tsjerkw at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Oh, I'm jetlagged. ^ is a control character for 'start of
>string'.
>>> In
>>>>> the
>>>>>> context of a character set it means negation: [^a-z].
>>>>>>
>>>>>> Ciao,
>>>>>>
>>>>>> Tsjerk
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 21, 2013 at 4:33 PM, Tsjerk Wassenaar
>>>>> <tsjerkw at gmail.com>wrote:
>>>>>>
>>>>>>> Hi Mark Na,
>>>>>>>
>>>>>>> Try:
>>>>>>>
>>>>>>> grepl("latitude\\^2",temp)
>>>>>>>
>>>>>>> ^ is a control character for negation, so you have to escape it.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Tsjerk
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 21, 2013 at 4:26 PM, <mtb954 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello R-helpers,
>>>>>>>>
>>>>>>>> I am trying to search for string that includes the caret
>symbol,
>>>>> using
>>>>>>>> the
>>>>>>>> following code:
>>>>>>>>
>>>>>>>> grepl("latitude^2",temp)
>>>>>>>>
>>>>>>>>
>>>>>>>> And R doesn't like that. It gives me:
>>>>>>>>
>>>>>>>>> temp<-c("latitude^2","latitude and
>>>>> latitude^2","longitude^2","longitude
>>>>>>>> and longitude^2")
>>>>>>>>> temp
>>>>>>>> [1] "latitude^2"                "latitude and latitude^2"
>>>>> "longitude^2"
>>>>>>>>               "longitude and longitude^2"
>>>>>>>>> grepl("latitude^2",temp)
>>>>>>>> [1] FALSE FALSE FALSE FALSE
>>>>>>>>
>>>>>>>>
>>>>>>>> I think this must a regex problem, but I can't find out to
>>> specify
>>>>> the
>>>>>>>> caret using regex.
>>>>>>>>
>>>>>>>> I would appreciate any help you could provide.
>>>>>>>>
>>>>>>>> Many thanks,
>>>>>>>>
>>>>>>>> Mark Na
>>>>>>>>
>>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>> code.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>>>
>>>>>>> post-doctoral researcher
>>>>>>> Biocomputing Group
>>>>>>> Department of Biological Sciences
>>>>>>> 2500 University Drive NW
>>>>>>> Calgary, AB T2N 1N4
>>>>>>> Canada
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>>
>>>>>> post-doctoral researcher
>>>>>> Biocomputing Group
>>>>>> Department of Biological Sciences
>>>>>> 2500 University Drive NW
>>>>>> Calgary, AB T2N 1N4
>>>>>> Canada
>>>>>>
>>>>>
>>>>> 	[[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>



More information about the R-help mailing list