[R] help with regular expressions in R

jim holtman jholtman at gmail.com
Thu Aug 20 17:42:24 CEST 2009


How about this:

> myCharVec <- c("[the rain in spain]", "(the rain in spain)")
> gsub('\\[.*\\]', '', myCharVec)
[1] ""                    "(the rain in spain)"
>


you had "*." when you should have ".*"

On Thu, Aug 20, 2009 at 11:30 AM, Mark Kimpel<mwkimpel at gmail.com> wrote:
> I'm having trouble achieving the results I want using a regular expression.
> I want to eliminate all characters that fall within square brackets as well
> as the brackets themselves, returning an "". I'm not sure if it's R's use of
> double slash escapes or something else that is tripping me up. If I only use
> one slash I get
> 1: '\[' is an unrecognized escape in a character string
> 2: '\]' is an unrecognized escape in a character string
> 3: unrecognized escapes removed from "\[*.\]"
>
> Below is my self-contained code followed by sessionInfo().
>
> Thanks in advance for your help. I'm going to be doing a lot of text mining
> in the near future. I have an excellent O'Reilly book on regex's. What is
> the best reference for R's special treatment of these animals?
> Mark
>
>
> myCharVec <- c("[the rain in spain]", "(the rain in spain)")
> gsub('\\[*.\\]', '', myCharVec)
>
> #what I get
> # [1] "[the rain in spai"   "(the rain in spain)"
>
> #what I want
> [1] ""   "(the rain in spain)"
>
>> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-12 r49193)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] RWeka_0.3-20 tm_0.4
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.0 rJava_0.6-3 slam_0.1-3
>
>
> ------------------------------------------------------------
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
>
> "The real problem is not whether machines think but whether men do." -- B.
> F. Skinner
> ******************************************************************
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list