[R] Parsing regular expressions differently - feature request

Gabor Grothendieck ggrothendieck at gmail.com
Sat Nov 8 17:03:48 CET 2008


On Sat, Nov 8, 2008 at 9:41 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 08/11/2008 7:20 AM, John Wiedenhoeft wrote:
>>
>> Hi there,
>>
>> I rejoiced when I realized that you can use Perl regex from within R.
>> However, as the FAQ states "Some functions, particularly those involving
>> regular expression matching, themselves use metacharacters, which may need
>> to be escaped by the backslash mechanism. In those cases you may need a
>> quadruple backslash to represent a single literal one. "
>>
>> I was wondering if that is really necessary for perl=TRUE? wouldn't it be
>> possible to parse a string differently in a regex context, e.g.
>> automatically insert \\ for each \ , such that you can use the perl syntax
>> directly? For example, if you want to input a newline as a character, you
>> would use \n anyway. At the moment one says \\n to make it clear to R that
>> you mean \n to make clear that you mean newline... this is pretty annoying.
>> How likely is it that you want to pass a real newline character to PCRE
>> directly?
>
> No, that's not possible.  At the level where the parsing takes place R has
> no idea of its eventual use, so it can't tell that some strings are going to
> be interpreted as Perl, and others not.
>
> As Gabor mentioned, there have been various discussions of adding a new
> syntax for strings that are parsed literally, without processing any
> escapes, but no consensus on the right syntax to use.
>
> There are currently some fragile tricks that let you avoid escapes, e.g.
> using scan() to read a line:
>
>> re <- scan(what="", n=1)
> 1: [^\\]
> Read 1 item
>> re
> [1] "[^\\\\]"
>
> (I call this fragile because it works in scripts processed at console level,
> but not if you type the same thing into a function.)
>
> So I agree, it would be nice to have new syntax to allow this.  Last time
> this came up, I argued for something like \verb in LaTeX where the delimiter
> could be specified differently in each use.  Duncan TL suggested triple
> quotes, as in Python.  I think now that triple quotes would be be better
> than the particular form I suggested.

Ruby's quoting method looks quite flexible:

http://en.wikibooks.org/wiki/Ruby_Programming/Alternate_quotes



More information about the R-help mailing list