[R] Regex for ^ (the caret symbol)?
David Winsemius
dwinsemius at comcast.net
Mon Jan 21 19:55:11 CET 2013
On Jan 21, 2013, at 10:05 AM, Jeff Newmiller wrote:
> So what is the special behavior of the ^ symbol when not at the
> beginning of the string that occurs when it is not escaped?
Isn't there a distinction between what _is_ "special" and what should
be "special". You are saying that "^" after the beginning of a pattern
should not be special, and by extension that "$" before the end of a
pattern should not be special. What about the potential desire to have
a regex "conjunction" that picks from one of two patterns that are at
the beginning of a target? Doesn't "^" need to remain special to allow
this:
> grep("^thet|^that", c("thet is", "that is"))
[1] 1 2
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go
> Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#.
> Live Go...
> Live: OO#.. Dead: OO#..
> Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#.
> rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>
>> On 13-01-21 11:48 AM, Jeff Newmiller wrote:
>>> I am not sure I understand what worked perfectly, since it is my
>> understanding that ^ is only special at the beginning of the regex
>> (to
>> anchor the pattern at the beginning of the target string) or as the
>> first character of a character set (to indicate exclusion of the
>> listed
>> characters). In any other position the caret should behave like an
>> ordinary character. That is, your original pattern should have worked
>> as-is. This is supported by the help page documentation for regex in
>> the paragraph below the definition of [:xdigit:]. I think this is a
>> bug
>> in R.
>>
>> It's a documentation error rather than a bug. The ^ character is
>> special anywhere in the extended RE syntax defined by the TRE library
>> or the Perl-compatible library that we use. This is inconsistent
>> with
>> the POSIX standard, which might be what you were thinking of.
>>
>> Duncan Murdoch
>>
>>
>>
>>>
>> ---------------------------------------------------------------------------
>>> Jeff Newmiller The ..... ..... Go
>> Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#.
>>> Live
>> Go...
>>> Live: OO#.. Dead: OO#..
>> Playing
>>> Research Engineer (Solar/Batteries O.O#. #.O#.
>>> with
>>> /Software/Embedded Controllers) .OO#. .OO#.
>> rocks...1k
>>>
>> ---------------------------------------------------------------------------
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> mtb954 at gmail.com wrote:
>>>
>>>> Hi Tsjerk, many thanks...that worked perfectly!
>>>>
>>>> Mark Na
>>>>
>>>>
>>>>
>>>> On Mon, Jan 21, 2013 at 9:36 AM, Tsjerk Wassenaar
>> <tsjerkw at gmail.com>
>>>> wrote:
>>>>
>>>>> Oh, I'm jetlagged. ^ is a control character for 'start of string'.
>> In
>>>> the
>>>>> context of a character set it means negation: [^a-z].
>>>>>
>>>>> Ciao,
>>>>>
>>>>> Tsjerk
>>>>>
>>>>>
>>>>> On Mon, Jan 21, 2013 at 4:33 PM, Tsjerk Wassenaar
>>>> <tsjerkw at gmail.com>wrote:
>>>>>
>>>>>> Hi Mark Na,
>>>>>>
>>>>>> Try:
>>>>>>
>>>>>> grepl("latitude\\^2",temp)
>>>>>>
>>>>>> ^ is a control character for negation, so you have to escape it.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Tsjerk
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 21, 2013 at 4:26 PM, <mtb954 at gmail.com> wrote:
>>>>>>
>>>>>>> Hello R-helpers,
>>>>>>>
>>>>>>> I am trying to search for string that includes the caret symbol,
>>>> using
>>>>>>> the
>>>>>>> following code:
>>>>>>>
>>>>>>> grepl("latitude^2",temp)
>>>>>>>
>>>>>>>
>>>>>>> And R doesn't like that. It gives me:
>>>>>>>
>>>>>>>> temp<-c("latitude^2","latitude and
>>>> latitude^2","longitude^2","longitude
>>>>>>> and longitude^2")
>>>>>>>> temp
>>>>>>> [1] "latitude^2" "latitude and latitude^2"
>>>> "longitude^2"
>>>>>>> "longitude and longitude^2"
>>>>>>>> grepl("latitude^2",temp)
>>>>>>> [1] FALSE FALSE FALSE FALSE
>>>>>>>
>>>>>>>
>>>>>>> I think this must a regex problem, but I can't find out to
>> specify
>>>> the
>>>>>>> caret using regex.
>>>>>>>
>>>>>>> I would appreciate any help you could provide.
>>>>>>>
>>>>>>> Many thanks,
>>>>>>>
>>>>>>> Mark Na
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>>
>>>>>> post-doctoral researcher
>>>>>> Biocomputing Group
>>>>>> Department of Biological Sciences
>>>>>> 2500 University Drive NW
>>>>>> Calgary, AB T2N 1N4
>>>>>> Canada
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>
>>>>> post-doctoral researcher
>>>>> Biocomputing Group
>>>>> Department of Biological Sciences
>>>>> 2500 University Drive NW
>>>>> Calgary, AB T2N 1N4
>>>>> Canada
>>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
More information about the R-help
mailing list