[R] Parsing regular expressions differently - feature request
Duncan Murdoch
murdoch at stats.uwo.ca
Wed Nov 19 13:00:35 CET 2008
On 18/11/2008 1:36 PM, William Dunlap wrote:
> Duncan Murdoch murdoch at stats.uwo.ca Sat Nov 8 15:41:34 CET 2008
> wrote:
>> On 08/11/2008 7:20 AM, John Wiedenhoeft wrote:
>>> Hi there,
>>>
>>> I rejoiced when I realized that you can use Perl regex from within
> R. However,
>>> as the FAQ states "Some functions, particularly those involving
> regular
>>> expression matching, themselves use metacharacters, which may need
> to be
>>> escaped by the backslash mechanism. In those cases you may need a
> quadruple
>>> backslash to represent a single literal one. "
>>>
>>> I was wondering if that is really necessary for perl=TRUE? wouldn't
> it be
>>> possible to parse a string differently in a regex context, e.g.
> automatically
>>> insert \\ for each \ , such that you can use the perl syntax
> directly? For
>>> example, if you want to input a newline as a character, you would
> use \n
>>> anyway. At the moment one says \\n to make it clear to R that you
> mean \n to
>>> make clear that you mean newline... this is pretty annoying. How
> likely is it
>>> that you want to pass a real newline character to PCRE directly?
>> No, that's not possible. At the level where the parsing takes place R
>
>> has no idea of its eventual use, so it can't tell that some strings
> are
>> going to be interpreted as Perl, and others not.
>>
>> As Gabor mentioned, there have been various discussions of adding a
> new
>> syntax for strings that are parsed literally, without processing any
>> escapes, but no consensus on the right syntax to use.
>> ... [scan() example elided] ...
>> So I agree, it would be nice to have new syntax to allow this. Last
>> time this came up, I argued for something like \verb in LaTeX where
> the
>> delimiter could be specified differently in each use. Duncan TL
>> suggested triple quotes, as in Python. I think now that triple quotes
>
>> would be be better than the particular form I suggested.
>>
>> Duncan Murdoch
>
> Would a string with this alternate quoting be tagged (e.g., with a class
> that
> inherits from character) so that the deparser could display it in the
> style
> in which it was input?
I don't recall anything like that in the proposal, but Duncan TL may
have had it in mind.
Duncan Murdoch
> Functions which generate file names using the
> native
> Windows notation would like to have them displayed without the extra
> backslashes.
> However, adding a new class for this could mess up other things.
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
More information about the R-help
mailing list