[R] Regex for ^ (the caret symbol)?

Duncan Murdoch murdoch.duncan at gmail.com
Mon Jan 21 21:52:07 CET 2013


On 13-01-21 3:20 PM, Jeff Newmiller wrote:
> Apparently Extended RegExp syntax eliminated the "^-is-an-ordinary-character-except-for-two-uses" meaning that I am familiar with from the Basic RegExp usage, since GNU grep with the -e option also refuses to match the carat unless it is escaped. The TRE library treats BRE as obsolete, so we only get ERE and Perl regexes in R. So I guess it isn't a bug, but is rather a "feature".

I re-read the ?regex help page, and I think it does actually say this, 
so we don't even have a documentation error as I thought before.  When 
it is saying that ^ is a plain character except when it comes first, it 
is talking about first within a character class, e.g. [a^] meaning "a" 
or "^" as opposed to [^a] meaning "not a".

Duncan Murdoch


---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                        Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>
>> On 13-01-21 1:05 PM, Jeff Newmiller wrote:
>>> So what is the special behavior of the ^ symbol when not at the
>> beginning of the string that occurs when it is not escaped?
>>
>> I think it retains its meaning as an assertion that it occurs at the
>> beginning of the line, and so a pattern like "a^b" could never match
>> anything.  It's not very useful in this context, but I expect it's
>> easier to implement in the case of complicated patterns, where some
>> paths through the pattern put it at the beginning and others don't,
>> e.g.
>>
>> (a|)^b
>>
>> has two possible patterns:  a^b and ^b.
>>
>> Duncan Murdoch
>>
>>>
>> ---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>> Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>> Go...
>>>                                         Live:   OO#.. Dead: OO#..
>> Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>> rocks...1k
>>>
>> ---------------------------------------------------------------------------
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>>
>>>> On 13-01-21 11:48 AM, Jeff Newmiller wrote:
>>>>> I am not sure I understand what worked perfectly, since it is my
>>>> understanding that ^ is only special at the beginning of the regex
>> (to
>>>> anchor the pattern at the beginning of the target string) or as the
>>>> first character of a character set (to indicate exclusion of the
>> listed
>>>> characters). In any other position the caret should behave like an
>>>> ordinary character. That is, your original pattern should have
>> worked
>>>> as-is. This is supported by the help page documentation for regex in
>>>> the paragraph below the definition of [:xdigit:]. I think this is a
>> bug
>>>> in R.
>>>>
>>>> It's a documentation error rather than a bug.  The ^ character is
>>>> special anywhere in the extended RE syntax defined by the TRE
>> library
>>>> or the Perl-compatible library that we use.  This is inconsistent
>> with
>>>> the POSIX standard, which might be what you were thinking of.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>
>>>>
>>>>>
>>>>
>> ---------------------------------------------------------------------------
>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>> Live...
>>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.
>> Live
>>>> Go...
>>>>>                                          Live:   OO#.. Dead: OO#..
>>>> Playing
>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>> with
>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>> rocks...1k
>>>>>
>>>>
>> ---------------------------------------------------------------------------
>>>>> Sent from my phone. Please excuse my brevity.
>>>>>
>>>>> mtb954 at gmail.com wrote:
>>>>>
>>>>>> Hi Tsjerk, many thanks...that worked perfectly!
>>>>>>
>>>>>> Mark Na
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 21, 2013 at 9:36 AM, Tsjerk Wassenaar
>>>> <tsjerkw at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Oh, I'm jetlagged. ^ is a control character for 'start of
>> string'.
>>>> In
>>>>>> the
>>>>>>> context of a character set it means negation: [^a-z].
>>>>>>>
>>>>>>> Ciao,
>>>>>>>
>>>>>>> Tsjerk
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 21, 2013 at 4:33 PM, Tsjerk Wassenaar
>>>>>> <tsjerkw at gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Mark Na,
>>>>>>>>
>>>>>>>> Try:
>>>>>>>>
>>>>>>>> grepl("latitude\\^2",temp)
>>>>>>>>
>>>>>>>> ^ is a control character for negation, so you have to escape it.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Tsjerk
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 21, 2013 at 4:26 PM, <mtb954 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello R-helpers,
>>>>>>>>>
>>>>>>>>> I am trying to search for string that includes the caret
>> symbol,
>>>>>> using
>>>>>>>>> the
>>>>>>>>> following code:
>>>>>>>>>
>>>>>>>>> grepl("latitude^2",temp)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And R doesn't like that. It gives me:
>>>>>>>>>
>>>>>>>>>> temp<-c("latitude^2","latitude and
>>>>>> latitude^2","longitude^2","longitude
>>>>>>>>> and longitude^2")
>>>>>>>>>> temp
>>>>>>>>> [1] "latitude^2"                "latitude and latitude^2"
>>>>>> "longitude^2"
>>>>>>>>>                "longitude and longitude^2"
>>>>>>>>>> grepl("latitude^2",temp)
>>>>>>>>> [1] FALSE FALSE FALSE FALSE
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think this must a regex problem, but I can't find out to
>>>> specify
>>>>>> the
>>>>>>>>> caret using regex.
>>>>>>>>>
>>>>>>>>> I would appreciate any help you could provide.
>>>>>>>>>
>>>>>>>>> Many thanks,
>>>>>>>>>
>>>>>>>>> Mark Na
>>>>>>>>>
>>>>>>>>>            [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>> code.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>>>>
>>>>>>>> post-doctoral researcher
>>>>>>>> Biocomputing Group
>>>>>>>> Department of Biological Sciences
>>>>>>>> 2500 University Drive NW
>>>>>>>> Calgary, AB T2N 1N4
>>>>>>>> Canada
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>>>
>>>>>>> post-doctoral researcher
>>>>>>> Biocomputing Group
>>>>>>> Department of Biological Sciences
>>>>>>> 2500 University Drive NW
>>>>>>> Calgary, AB T2N 1N4
>>>>>>> Canada
>>>>>>>
>>>>>>
>>>>>> 	[[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>
>



More information about the R-help mailing list