[R] [R} how to build TermDocMatrix in tm text mining package of R

Fri Jan 9 16:39:21 CET 2009

Hi there, I think something like the following is what you want:

### R start...
# if you put your plain text files in a folder like this
my.path <- 'C:\\Documents and Settings\\tony\\Desktop\\texts\\'

# then you can construct a simple tdm like this
library(tm)
my.corpus <- Corpus(DirSource(my.path), readerControl = list
(reader=readPlain))
my.tdm <- TermDocMatrix(my.corpus)

# this show show how words are distributed in the first text document
my.tdm[1, ]
### R end.

by the way, there are some nice examples of using the tm package in
the last Rnews letter (Volume 8/2, October 2008), under the section
'An Introduction to Text Mining in R':
http://cran.r-project.org/doc/Rnews/Rnews_2008-2.pdf

Hope that helps a little bit,
Tony Breyal

On 9 Jan, 14:21, "Kum-Hoe Hwang" <phdhw... at gmail.com> wrote:
> Howdy Gurus
>
> I 'd like to ask a question about how to build TermDocMatrix in tm text
> mining package.
>
> It is not clear about importing a plain text file, and them converting that
> text file into TermDocMatrix file, etc to me.
> How can I build a TermDocMatrix of " a plain text document file for text
> association?
> Or are there any good manuals?
>
> Thank you in advance,
>
> --
> Kum-Hoe Hwang, Ph.D.
>
> Phone : 82-31-250-3516
> Email : phdhw... at gmail.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.