[R] text matching

SNV Krishna krishna at primps.com.sg
Tue Sep 20 05:42:08 CEST 2011


Hi,

I noticed the mistake, first thing is double escape, so it should be "\\.BO"
instead of "\.BO" . Second and more important observation is tickers$V1.
Thanks for pointing out David and thank you all for the help.

Best regards,

Krishna

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Monday, September 19, 2011 10:10 PM
To: Sarah Goslee
Cc: SNV Krishna; r-help at r-project.org
Subject: Re: [R] text matching


On Sep 19, 2011, at 7:05 AM, Sarah Goslee wrote:

> Hi,
>
> On Mon, Sep 19, 2011 at 6:15 AM, SNV Krishna <krishna at primps.com.sg>
> wrote:
>> Hi All,
>>
>> I have a character vector by name tickers
>>
>>> head(tickers,10)
>>
>>            V1
>> 1  ADARSHPL.BO
>> 2        AGR.V
>> 3          AGU
>> 4       AGU.TO
>> 5     AIMCO.BO
>> 6  ALUFLUOR.BO
>> 7        AMZ.V
>> 8          AVD
>> 9  ANILPROD.BO
>> 10    ARIES.BO
>>
>> I would like to extract all elements that has ".BO" in it. I tried
>>
>>> grep("\.BO",tickers)
>> Error: '\.' is an unrecognized escape in character string starting 
>> "\."
>
> You need instead:
>> tickers <- c("A.BO", "BOB", "C.BO")
>> grep("\\.BO", tickers)
> [1] 1 3
>>
>> tickers[grep("\\.BO", tickers)]
> [1] "A.BO" "C.BO"
>
>
>>> grep(".BO",tickers)
>> [1] 1
>
> That's odd; it should have returned many more matches. You may need to 
> check the format of your data.

There are two NOT-oddities at work here. Periods and other special  
characters need to be doubly escaped when used as literals in search   
patterns,  and the vector that needs to be searched is not "tickers"  
but rather "tickers$V1".

That result is because there is only one element in the list named "tickers"
and grep finds that it does have an instance that matches the pattern.
(Despite that fact that it is not searching what the OP thought he was
searching for but rather a more general pattern.)

--
David.


>
>> Could any one please guide me on this. Many thanks for the help
>>
>> Best Regards,
>>
>> Krishna
>>
>
>
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list