[R] Parsing regular expressions differently - feature request

Duncan Murdoch murdoch at stats.uwo.ca
Wed Nov 19 13:00:35 CET 2008


On 18/11/2008 1:36 PM, William Dunlap wrote:
> Duncan Murdoch murdoch at stats.uwo.ca Sat Nov 8 15:41:34 CET 2008
> wrote:
>> On 08/11/2008 7:20 AM, John Wiedenhoeft wrote:
>>> Hi there,
>>>
>>> I rejoiced when I realized that you can use Perl regex from within
> R. However, 
>>> as the FAQ states "Some functions, particularly those involving
> regular 
>>> expression matching, themselves use metacharacters, which may need
> to be 
>>> escaped by the backslash mechanism. In those cases you may need a
> quadruple 
>>> backslash to represent a single literal one. "
>>>
>>> I was wondering if that is really necessary for perl=TRUE? wouldn't
> it be 
>>> possible to parse a string differently in a regex context, e.g.
> automatically 
>>> insert \\ for each \ , such that you can use the perl syntax
> directly? For 
>>> example, if you want to input a newline as a character, you would
> use \n 
>>> anyway. At the moment one says \\n to make it clear to R that you
> mean \n to 
>>> make clear that you mean newline... this is pretty annoying. How
> likely is it 
>>> that you want to pass a real newline character to PCRE directly?
>> No, that's not possible.  At the level where the parsing takes place R
> 
>> has no idea of its eventual use, so it can't tell that some strings
> are 
>> going to be interpreted as Perl, and others not.
>>
>> As Gabor mentioned, there have been various discussions of adding a
> new 
>> syntax for strings that are parsed literally, without processing any 
>> escapes, but no consensus on the right syntax to use.
>> ... [scan() example elided] ...
>> So I agree, it would be nice to have new syntax to allow this.  Last 
>> time this came up, I argued for something like \verb in LaTeX where
> the 
>> delimiter could be specified differently in each use.  Duncan TL 
>> suggested triple quotes, as in Python.  I think now that triple quotes
> 
>> would be be better than the particular form I suggested.
>>
>> Duncan Murdoch
> 
> Would a string with this alternate quoting be tagged (e.g., with a class
> that
> inherits from character) so that the deparser could display it in the
> style
> in which it was input?  

I don't recall anything like that in the proposal, but Duncan TL may 
have had it in mind.

Duncan Murdoch

 > Functions which generate file names using the
> native
> Windows notation would like to have them displayed without the extra
> backslashes.
> However, adding a new class for this could mess up other things.
>  
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com



More information about the R-help mailing list