[R] Using a text file as a removeWord dictionary in tm_map

Sun Shine phaedrusv at gmail.com
Sat Feb 28 14:46:37 CET 2015


Hi list

Although this query applies specifically to the tm package, perhaps it's 
something that others might be able to lend a thought to.

Using tm to do some initial text mining, I want to include an external 
(to R) generated dictionary of words that I want removed from the corpus.

I have created a comma separated list of terms in " " marks in a 
stopList.txt plain UTF-8 file. I want to read this into R, so do:

 > stopDict <- read.table('~/path/to/file/stopList.txt', sep=',')

When I want to load it as part of the removeWords function in tm, I do:

 > docs <- tm_map(docs, removeWords, stopDict)

which has no effect. Neither does:

 > docs <- tm_map(docs, removeWords, c(stopDict))

What am I not seeing/ doing?

How do I pass a text file with pre-defined terms to the removeWords 
transform of tm?

Thanks for any ideas.

Cheers

Sun



More information about the R-help mailing list