[R] Why do my regular expressions require a double escape \\ to get a literal??
Roey Angel
angel at mpi-marburg.mpg.de
Fri Mar 2 09:36:43 CET 2012
Hi,
I was recently misfortunate enough to have to use regular expressions to
sort out some data in R.
I'm working on a data file which contains taxonomical data of bacteria
in hierarchical order.
A sample of this file can be generated using:
tax.data <- read.table(header=F, con <- textConnection('
G9SS7BA01D15EC Bacteria(100) Cyanobacteria(84) unclassified
G9SS7BA01C9UIR Bacteria(100) Proteobacteria(94)
Alphaproteobacteria(89)
G9SS7BA01CM00D Bacteria(100) Proteobacteria(99)
Alphaproteobacteria(99)
'))
close(con)
What I try to do is to remove the parenthesis and the number inside
(which could contain a decimal point)
I assumed that the following command would solve it, but instead I got
an error.
tax.data <- as.data.frame(apply(tax.data, 2, function(x)
gsub('\(.*\)','',x)))
Error: '\(' is an unrecognized escape in character string starting "\("
And it doesn't matter if I use perl = TRUE or not.
To solve it I need to use a double escape sign '\\' before opening and
closing the parenthesis:
tax.data <- as.data.frame(apply(tax.data, 2, function(x)
gsub('\\(.*\\)','',x)))
This yields the desired result but I wonder why it does that?
No other regular expression system I'm used to (e.g. Perl, Shell) works
like that.
I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu
and win XP.
I'd appreciate any explanation.
Thanks in advance,
baffled Roey
--
Dr. Roey Angel
Max-Planck-Institute for Terrestrial Microbiology
Karl-von-Frisch-Strasse 10
D-35043 Marburg, Germany
Office: +49 (0)6421/178-832
Mobile: +49 (0)176/612-785-88
More information about the R-help
mailing list