[R] Parsing regular expressions differently - feature request

Duncan Murdoch murdoch at stats.uwo.ca
Sat Nov 8 23:55:06 CET 2008


On 08/11/2008 5:43 PM, Wacek Kusnierczyk wrote:
> Duncan Murdoch wrote:
>>>>>>> I was wondering if that is really necessary for perl=TRUE? wouldn't
>>>>>>> it be
>>>>>>> possible to parse a string differently in a regex context, e.g.
>>>>>>> automatically insert \\ for each \ , such that you can use the perl
>>>>>>> syntax
>>>>>>> directly? For example, if you want to input a newline as a
>>>>>>> character, you
>>>>>>> would use \n anyway. At the moment one says \\n to make it clear to
>>>>>>> R that
>>>>>>> you mean \n to make clear that you mean newline... this is pretty
>>>>>>> annoying.
>>>>>>> How likely is it that you want to pass a real newline character to
>>>>>>> PCRE
>>>>>>> directly?
>>>>>> No, that's not possible.  At the level where the parsing takes place
>>>>>> R has
>>>>>> no idea of its eventual use, so it can't tell that some strings are
>>>>>> going to
>>>>>> be interpreted as Perl, and others not.
>>> Here's a quick hack to achieve the impossible:
>> That might solve John's problem, but I doubt it.  As far as I can see
>> it won't handle \L, for example.
>>
> 
> well, it was not supposed to.  it addresses the need for doubling
> backslashes when a backslash character is an element of the regex. 

\L could be an element of a regex in Perl.

Duncan Murdoch


> 
> foo = "foo\\n\n"
> 
> grep("\n", foo, perl=TRUE, value=TRUE)
> mygrep("\n", foo, perl=TRUE, value=TRUE)
> # both match the newline
> 
> grep("\\n", foo, perl=TRUE, value=TRUE)
> mygrep("\\n", foo, perl=TRUE, value=TRUE)
> # both match (guess what)
> 
> bar = "bar\n"
> 
> grep("\n", bar, perl=TRUE, value=TRUE)
> mygrep("\n", bar, perl=TRUE, value=TRUE)
> # both match the newline
> 
> grep("\\n", bar, perl=TRUE, value=TRUE)
> mygrep("\\n", bar, perl=TRUE, value=TRUE)
> # counterintuitively, grep matches (intuitively, it should match
> backslash-n, not a newline, but there's just a newline in bar) -- i do
> know why it matches, but i'm pretty sure for many of those who do it's
> an inconvenient detail, and for those who don't it's a confusing annoyance
> 
> zee = "zee\\"
> 
> grep("\\", zee, perl=TRUE, value=TRUE)
> mygrep("\\", zee, perl=TRUE, value=TRUE)
> # grep fails, needs "\\\\"
> 
> conclusion?  i'd opt for mygrep in my own code; i guessed this was what
> john wanted, therefore the post.
> 
> vQ



More information about the R-help mailing list