[R] Need content_transformer() called by tm_map() to change non-letters to spaces

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Fri Apr 24 06:09:59 CEST 2015


Regex "[^a-zA-Z]" reads as "not a letter". 
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On April 23, 2015 1:10:41 PM PDT, Mike <mikehall at y7mail.com> wrote:
>Hello,
>In the following code, any characters matching  "/|@| \\|") will be
>changed to a space. 
>> library(tm)
>> toSpace <- content_transformer(function(x, pattern) gsub(pattern, "
>", x))
>> docs <- tm_map(docs, toSpace, "/|@| \\|")
>
>What code would transform all non-letters to a space?  (What goes where
>the xxxxx's are.)It is very difficult to put all non-letters in a
>string...  So I'm doing the opposite of the above.
>> toSpace_2 <- content_transformer(function xxxxxxxxxxxxxxxxxxxxxxx))
>> docs <- tm_map(docs, toSpace_2, "abcdefghijklmnopqrstuvwxyz")
>
>This needs to be done by a content_transformer() function to maintain
>the integrity of docs.
>
>Thanks
> 
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list