[R] Help using "tm" text mining package - preprocessing

dunner ross.dunne at tcd.ie
Thu Feb 10 16:42:08 CET 2011


Thanks all for your help. I fear text mining is an abstract little corner of
"R".

I have imported 3228 text (.txt) files, each a news story, into R using
[tm]:

textd <- Corpus(DirSource("other/docs"), readerControl = list(reader
=readPlain))

I can pre-process each individual document using tolower(textd[[1]])
however, when  I try to run tmTolower() I get a no such command error, and 
then the Term Document Matrix command gives me a peculiar  error: 

> other.TDM <- TermDocumentMatrix(textd, control = list(stopwords = TRUE))
Error in tolower(txt) : 
  invalid input 'Valentino bag, breakfasting at West Palm Beach café Testa .
. . VALENTINO, in' in 'utf8towcs'
> 

Is it something to do with the structure of the documents I've read in.

The "tm" documentation is  *extremely* abstract, at my Neanderthal level.

Thanks to anyone who can help



-- 
View this message in context: http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3299399.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list