[R] text matching

David Winsemius dwinsemius at comcast.net
Mon Sep 19 16:10:03 CEST 2011


On Sep 19, 2011, at 7:05 AM, Sarah Goslee wrote:

> Hi,
>
> On Mon, Sep 19, 2011 at 6:15 AM, SNV Krishna <krishna at primps.com.sg>  
> wrote:
>> Hi All,
>>
>> I have a character vector by name tickers
>>
>>> head(tickers,10)
>>
>>            V1
>> 1  ADARSHPL.BO
>> 2        AGR.V
>> 3          AGU
>> 4       AGU.TO
>> 5     AIMCO.BO
>> 6  ALUFLUOR.BO
>> 7        AMZ.V
>> 8          AVD
>> 9  ANILPROD.BO
>> 10    ARIES.BO
>>
>> I would like to extract all elements that has ".BO" in it. I tried
>>
>>> grep("\.BO",tickers)
>> Error: '\.' is an unrecognized escape in character string starting  
>> "\."
>
> You need instead:
>> tickers <- c("A.BO", "BOB", "C.BO")
>> grep("\\.BO", tickers)
> [1] 1 3
>>
>> tickers[grep("\\.BO", tickers)]
> [1] "A.BO" "C.BO"
>
>
>>> grep(".BO",tickers)
>> [1] 1
>
> That's odd; it should have returned many more matches. You may need to
> check the format of your data.

There are two NOT-oddities at work here. Periods and other special  
characters need to be doubly escaped when used as literals in search   
patterns,  and the vector that needs to be searched is not "tickers"  
but rather "tickers$V1".

That result is because there is only one element in the list named  
"tickers" and grep finds that it does have an instance that matches  
the pattern. (Despite that fact that it is not searching what the OP  
thought he was searching for but rather a more general pattern.)

-- 
David.


>
>> Could any one please guide me on this. Many thanks for the help
>>
>> Best Regards,
>>
>> Krishna
>>
>
>
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list