[R] [SPAM] - help with regular expressions in R - Bayesian Filter detected spam

davidr at rhotrading.com davidr at rhotrading.com
Thu Aug 20 17:43:46 CEST 2009


Possibly just a typo:
> gsub('\\[.*\\]', '', myCharVec)
           ^^
[1] ""                    "(the rain in spain)"

HTH,
-- David


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Mark Kimpel
Sent: Thursday, August 20, 2009 10:31 AM
To: r-help at r-project.org
Subject: [SPAM] - [R] help with regular expressions in R - Bayesian
Filter detected spam

I'm having trouble achieving the results I want using a regular
expression.
I want to eliminate all characters that fall within square brackets as
well
as the brackets themselves, returning an "". I'm not sure if it's R's
use of
double slash escapes or something else that is tripping me up. If I only
use
one slash I get
1: '\[' is an unrecognized escape in a character string
2: '\]' is an unrecognized escape in a character string
3: unrecognized escapes removed from "\[*.\]"

Below is my self-contained code followed by sessionInfo().

Thanks in advance for your help. I'm going to be doing a lot of text
mining
in the near future. I have an excellent O'Reilly book on regex's. What
is
the best reference for R's special treatment of these animals?
Mark


myCharVec <- c("[the rain in spain]", "(the rain in spain)")
gsub('\\[*.\\]', '', myCharVec)

#what I get
# [1] "[the rain in spai"   "(the rain in spain)"

#what I want
[1] ""   "(the rain in spain)"

> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-08-12 r49193)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] RWeka_0.3-20 tm_0.4

loaded via a namespace (and not attached):
[1] grid_2.10.0 rJava_0.6-3 slam_0.1-3


------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail

"The real problem is not whether machines think but whether men do." --
B.
F. Skinner
******************************************************************

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
imal, self-contained, reproducible code.


This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "Rho Content") are confidential and proprietary to Rho Trading Securities, LLC ("Rho") and/or its affiliates, and are protected by intellectual property laws.  Without the prior written consent of Rho, the Rho Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of Rho or its affiliates, on behalf of Rho or its affiliates.

THE RHO CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, RHO HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE RHO CONTENT, AND NEITHER RHO NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY RHO CONTENT, EVEN IF RHO IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.




More information about the R-help mailing list