[R] regular expression for selection

Uwe Ligges ligges at statistik.tu-dortmund.de
Mon Nov 14 11:31:40 CET 2011



On 14.11.2011 11:27, Petr PIKAL wrote:
> Hi
>
> Thank you. It is a pure magic, something taught in Unseen University.
>
> this is what I got as a help for selecting only letters from set of
> character vector.
>
>> vzor
>   [1] "61A"     "62C/27"  "65A/27"  "66C/29"  "69A/29"  "70C/31"
> "73A/31"
>   [8] "74C/33"  "77A/33"  "81A/35"  "82C/37"  "85A/37"  "86C/39"
> "89A/39"
> [15] "90C/41"  "93A/41"  "94C/43"  "97A/43"  "98C/45"  "101A/45"
> "102C/47"
> [22] "105A/47" "106C/49" "109A/49" "110C/51" "113A/51"
>
>> gsub("[^A-z]", "", vzor)
>   [1] "A" "C" "A" "C" "A" "C" "A" "C" "A" "A" "C" "A" "C" "A" "C" "A" "C"
> [18] "A" "C" "A" "C" "A" "C" "A" "C" "A"
>
> Therefore I expected that
>
> sub("m5.", "\\1", mena) or sub("m5.", "", mena)
>
> selects what I wanted. But it was not the case.
>
> Please can you correct me when I try to evaluate your solution?
>
> gsub(".*_(m5.).*", "\\1", mena)
>
> or
>
> gsub(".*(m5.).*", "\\1", mena)
>
> .* matches any characters

Yes.

> () negation? or matching selection for back reference?

The latter. See books about ergular expressions. I think it is also 
mentioned in ?regexp and with an example in ?gsub



> Finally the expressin matches whole string and evaluates what is matched
> by parenthesised value. This evaluation is returned by backreference.
>
> Is it correct evaluation?

Indeed, where \\1 is the first backreference.

Best,
Uwe




> Regards
> Petr
>
>>
>> On 14.11.2011 10:22, Petr PIKAL wrote:
>>> Hi
>>>
>>>> On 11/14/2011 07:45 PM, Petr PIKAL wrote:
>>>>> Dear all
>>>>>
>>>>> I am again (as usual) lost in regular expression use for selection.
>>> Here
>>>>> are my data:
>>>>>
>>>>>> dput(mena)
>>>>> c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp",
>>>>> "138516_10g_50ml_50c_250utes1_m54.00_s1.imp",
>>>>> "138516_10g_50ml_50c_250utes1_m55.00_s1.imp",
>>>>> "138516_10g_50ml_50c_250utes1_m56.00_s1.imp",
>>>>> "138516_10g_50ml_50c_250utes1_m57.00_s1.imp",
>>>>> "138516_10g_50ml_50c_250utes1_m58.00_s1.imp",
>>>>> "138516_10g_50ml_50c_250utes1_m59.00_s1.imp")
>>>>>
>>>>> I want to select only values "m" foolowed by numbers from 53 to 59.
>>>>>
>>>>> I used
>>>>>
>>>>> sub("m5.", "", mena)
>>>>>
>>>>> which correctly selects those m53 - m59 values but, in contrary to
> my
>>>>> expectation, it replaced the selected values with specified
>>> replacement -
>>>>> in that case empty string.
>>>>>
>>>>> What I shall use if I want to get rid of all but m53-m59 from those
>>>>> strings?
>>>>>
>>>> Hi Petr,
>>>> How about:
>>>>
>>>> grep("m5",mena)
>>>
>>> It gives numeric values which tells me that there is a match in each
>>> string, but as a result I need only
>>>
>>> m53-m59 substrings.
>>
>>
>> gsub(".*_(m5.).*", "\\1", mena)
>>
>> Uwe Ligges
>>
>>
>>
>>> Regards
>>> Petr
>>>
>>>
>>>
>>>>
>>>> Jim
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list