[R] regular expression help

Ashim Kapoor ashimkapoor at gmail.com
Thu Jun 8 14:06:39 CEST 2017


Dear Enrico,

Many thanks and Best Regards,

Ashim.

On Thu, Jun 8, 2017 at 5:11 PM, Enrico Schumann <es at enricoschumann.net>
wrote:

>
> Zitat von Ashim Kapoor <ashimkapoor at gmail.com>:
>
>
> Dear All,
>>
>> My query is:
>>
>> Do we always need to use perl = TRUE option when doing ignore.case=TRUE?
>>
>> A small example :
>>
>> my_text =
>> "RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
>> VS RAMESH GUPTA.\n    Dated: 01.03.2016                   Item no.01\n
>> Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
>> for ARCIL.\n                None for the CDs.\n  The counsel for the CHFI
>> submitted that the matter has been assigned to ARCIL and deed of
>> assignment, application for substituting the name and vakalatnama has been
>> filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
>> prayed that ARCIL may be substituted in place of SBI for the purpose of
>> further proceedings in the matter. Request allowed.\nThe proxy counsel for
>> CHFI further requested to issue demand notice thereby mentioning the name
>> of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
>> notice mentioning the name of ARCIL.\nCHFI is directed to file status of
>> the mortgaged property as well as other assets of the CDs.\nList the case
>> on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."
>>
>> My regular expression is:
>>
>> parties_present_start_1=
>> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)
>>
>> parties_present_start_2=
>> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)
>>
>> parties_present_start_1
>>>
>> [1] 138
>> attr(,"match.length")
>> [1] 123
>> attr(,"useBytes")
>> [1] TRUE
>>
>>> parties_present_start_2
>>>
>> [1] 20
>> attr(,"match.length")
>> [1] 949
>> attr(,"useBytes")
>> [1] TRUE
>>
>>>
>>>
>> Why do I see the correct result only in the first case?
>>
>> Best Regards,
>> Ashim
>>
>>
> In Perl, '.' matches anything but a newline.
>
> In R, '.' matches any character.
>
>   test <- "hello\n1"
>   regexpr(".*[0-9]", test)
>   ## [1] 1
>   ## attr(,"match.length")
>   ## [1] 7
>   ## attr(,"useBytes")
>   ## [1] TRUE
>
>   regexpr(".*[0-9]", test, perl = TRUE)
>   ## [1] 7
>   ## attr(,"match.length")
>   ## [1] 1
>   ## attr(,"useBytes")
>   ## [1] TRUE
>
>
> --
> Enrico Schumann
> Lucerne, Switzerland
> http://enricoschumann.net
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list