[R] regular expression help

Enrico Schumann es at enricoschumann.net
Thu Jun 8 13:41:24 CEST 2017


Zitat von Ashim Kapoor <ashimkapoor at gmail.com>:

> Dear All,
>
> My query is:
>
> Do we always need to use perl = TRUE option when doing ignore.case=TRUE?
>
> A small example :
>
> my_text =
> "RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
> VS RAMESH GUPTA.\n    Dated: 01.03.2016                   Item no.01\n
> Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
> for ARCIL.\n                None for the CDs.\n  The counsel for the CHFI
> submitted that the matter has been assigned to ARCIL and deed of
> assignment, application for substituting the name and vakalatnama has been
> filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
> prayed that ARCIL may be substituted in place of SBI for the purpose of
> further proceedings in the matter. Request allowed.\nThe proxy counsel for
> CHFI further requested to issue demand notice thereby mentioning the name
> of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
> notice mentioning the name of ARCIL.\nCHFI is directed to file status of
> the mortgaged property as well as other assets of the CDs.\nList the case
> on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."
>
> My regular expression is:
>
> parties_present_start_1=
> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)
>
> parties_present_start_2=
> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)
>
>> parties_present_start_1
> [1] 138
> attr(,"match.length")
> [1] 123
> attr(,"useBytes")
> [1] TRUE
>> parties_present_start_2
> [1] 20
> attr(,"match.length")
> [1] 949
> attr(,"useBytes")
> [1] TRUE
>>
>
> Why do I see the correct result only in the first case?
>
> Best Regards,
> Ashim
>

In Perl, '.' matches anything but a newline.

In R, '.' matches any character.

   test <- "hello\n1"
   regexpr(".*[0-9]", test)
   ## [1] 1
   ## attr(,"match.length")
   ## [1] 7
   ## attr(,"useBytes")
   ## [1] TRUE

   regexpr(".*[0-9]", test, perl = TRUE)
   ## [1] 7
   ## attr(,"match.length")
   ## [1] 1
   ## attr(,"useBytes")
   ## [1] TRUE


-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net



More information about the R-help mailing list