[R] Why do my regular expressions require a double escape \\ to get a literal??

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Fri Mar 2 14:59:26 CET 2012


Roey, you imply that this is unusual in implementations of regex, yet some of the oldest applications using regex out there are sed or awk, where extra quoting is so common that some people don't recognize regex patterns that are missing this extra level of quoting. Sigh.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Berend Hasselman <bhh at xs4all.nl> wrote:

>
>On 02-03-2012, at 09:36, Roey Angel wrote:
>
>> Hi,
>> I was recently misfortunate enough to have to use regular expressions
>to sort out some data in R.
>> I'm working on a data file which contains taxonomical data of
>bacteria in hierarchical order.
>> A sample of this file can be generated using:
>> 
>> tax.data <- read.table(header=F, con <- textConnection('
>> G9SS7BA01D15EC  Bacteria(100)    Cyanobacteria(84)    unclassified
>> G9SS7BA01C9UIR    Bacteria(100)    Proteobacteria(94)   
>Alphaproteobacteria(89)
>> G9SS7BA01CM00D    Bacteria(100)    Proteobacteria(99)   
>Alphaproteobacteria(99)
>> '))
>> close(con)
>> 
>> What I try to do is to remove the parenthesis and the number inside
>(which could contain a decimal point)
>> I assumed that the following command would solve it, but instead I
>got an error.
>> 
>> tax.data <- as.data.frame(apply(tax.data, 2, function(x)
>gsub('\(.*\)','',x)))
>> Error: '\(' is an unrecognized escape in character string starting
>"\("
>> 
>> And it doesn't matter if I use perl = TRUE or not.
>> To solve it I need to use a double escape sign '\\' before opening
>and closing the parenthesis:
>> 
>> tax.data <- as.data.frame(apply(tax.data, 2, function(x)
>gsub('\\(.*\\)','',x)))
>> 
>> This yields the desired result but I wonder why it does that?
>> No other regular expression system I'm used to (e.g. Perl, Shell)
>works like that.
>> 
>> I'm using R 2.14 (but also R 2.10) and I get the same results on
>Ubuntu and win XP.
>> 
>> I'd appreciate any explanation.
>
>Section "Character vectors" in the R Intro manual.
>
>?Quotes
>
>The regular expression is provided as a string to gsub. In strings
>there are escape sequences.
>To get the \ as a single \ to the regular expression parser it has to
>be \-ed in the string stage: \\
>
>Berend
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list