[R] Classifying large text corpora using R
daniel at umd.edu
Sat Sep 3 19:34:09 CEST 2011
Take a look here: http://www.jstatsoft.org/v25/i05/paper
> Dear everyone,
> I am new to R, and I am looking at doing text classification on a huge
> collection of documents (>500,000) which are distributed among 300 classes
> (so basically, this is my training data). Would someone please be kind
> enough to let me know about the R packages to use and their scalability
> (time and space)?
> I am very new to R and do not know of the right packages to use. I started
> off by trying to use the tm package (http://cran.r-project.org/package=tm)
> for pre-processing and FSelector
> (http://cran.r-project.org/web/packages/FSelector/index.html) package for
> feature selection - but both of these are incredibly slow and completely
> unusable for my task.
> So the question is what are the right packages to use (for pre-processing,
> feature selection, and classification)? Please consider the fact that I
> may be dealing with data of millions of dimensions which may not even fit
> in memory.
> I posted on this issue twice
> but did not get any response. This is a very critical piece of my research
> and I have been struggling with this issue for a long time. Please
> consider helping me out, directly or by pointing me to any other
> software/website that you think may be more appropriate.
> Many thanks in advance.
View this message in context: http://r.789695.n4.nabble.com/Classifying-large-text-corpora-using-R-tp3786787p3788196.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help