[R] element wise pattern recognition and string substitution

Mon Sep 5 18:44:36 CEST 2016

I am not the one who proved this... I can only respond to your suggested counterexamples.
-- 
Sent from my phone. Please excuse my brevity.

On September 5, 2016 9:01:12 AM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>Jeff:
>
>It is not obvious to me that the ability to *match* an arbitrary
>pattern (including one of several different ones via "|" , per the
>link you included) implies that sub() and friends can extract it, e.g.
>via the /N construct or otherwise.  I would appreciate it if you or
>someone else could show me how this can be done.
>
>Cheers,
>Bert
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Mon, Sep 5, 2016 at 8:37 AM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us> wrote:
>> Yes, sorry I did not look closer... regex can match any finite
>language, so there are no data sets you can feed to R that cannot be
>matched. [1] You may find it hard to see the pattern, or you may want
>to build the pattern programmatically to alleviate tedium for yourself,
>but regexes are not the constraint.
>>
>> http://www.cs.nuim.ie/~jpower/Courses/Previous/parsing/node18.html
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 4, 2016 10:41:45 PM PDT, Bert Gunter
><bgunter.4567 at gmail.com> wrote:
>>>Well, he did provide an example, and...
>>>
>>>
>>>> z <- c('TX.WT.CUT.mean','mg.tx.cv')
>>>
>>>> sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>>>[1] "WT.CUT" "tx"
>>>
>>>
>>>## seems to do what was requested.
>>>
>>>Jeff would have to amplify on his initial statement however: do you
>>>mean that separate patterns can always be combined via "|" ?  Or
>>>something deeper?
>>>
>>>Cheers,
>>>Bert
>>>Bert Gunter
>>>
>>>"The trouble with having an open mind is that people keep coming
>along
>>>and sticking things into it."
>>>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>>On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller
>>><jdnewmil at dcn.davis.ca.us> wrote:
>>>> Your opening assertion is false.
>>>>
>>>> Provide a reproducible example and someone will demonstrate.
>>>> --
>>>> Sent from my phone. Please excuse my brevity.
>>>>
>>>> On September 4, 2016 9:06:59 PM PDT, Jun Shen
><jun.shen.ut at gmail.com>
>>>wrote:
>>>>>Dear list,
>>>>>
>>>>>I have a vector of strings that cannot be described by one pattern.
>>>So
>>>>>let's say I construct a vector of patterns in the same length as
>the
>>>>>vector
>>>>>of strings, can I do the element wise pattern recognition and
>string
>>>>>substitution.
>>>>>
>>>>>For example,
>>>>>
>>>>>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>>>>>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"
>>>>>
>>>>>patterns <- c(pattern1,pattern2)
>>>>>strings <- c('TX.WT.CUT.mean','mg.tx.cv')
>>>>>
>>>>>Say I want to extract "WT.CUT" from the first string and "tx" from
>>>the
>>>>>second string. If I do
>>>>>
>>>>>sub(patterns, '\\2', strings), only the first pattern will be used.
>>>>>
>>>>>looping the patterns doesn't work the way I want. Appreciate any
>>>>>comments.
>>>>>Thanks.
>>>>>
>>>>>Jun
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>>______________________________________________
>>>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide
>>>>>http://www.R-project.org/posting-guide.html
>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>