[R] Need content_transformer() called by tm_map() to change non-letters to spaces
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Fri Apr 24 06:09:59 CEST 2015
Regex "[^a-zA-Z]" reads as "not a letter".
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On April 23, 2015 1:10:41 PM PDT, Mike <mikehall at y7mail.com> wrote:
>Hello,
>In the following code, any characters matching "/|@| \\|") will be
>changed to a space.
>> library(tm)
>> toSpace <- content_transformer(function(x, pattern) gsub(pattern, "
>", x))
>> docs <- tm_map(docs, toSpace, "/|@| \\|")
>
>What code would transform all non-letters to a space? (What goes where
>the xxxxx's are.)It is very difficult to put all non-letters in a
>string... So I'm doing the opposite of the above.
>> toSpace_2 <- content_transformer(function xxxxxxxxxxxxxxxxxxxxxxx))
>> docs <- tm_map(docs, toSpace_2, "abcdefghijklmnopqrstuvwxyz")
>
>This needs to be done by a content_transformer() function to maintain
>the integrity of docs.
>
>Thanks
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list