[R-pkgs] tm 0.1 uploaded to CRAN
h0125130 at wu-wien.ac.at
Thu Jan 11 11:52:23 CET 2007
a first version of tm has just been released on CRAN.
tm provides a sophisticated framework for text mining applications
It offers functionality for managing text documents, abstracts the
process of document manipulation and eases the usage of heterogeneous
text formats in R. An advanced metadata management is
implemented for collections of text documents to alleviate the usage
of large and with metadata enriched document sets.
With the package ships native support for handling
*) the Reuters 21578 dataset,
*) the Reuters Corpus Volume 1 dataset,
*) Gmane RSS feeds,
*) e-mails, and
*) several classic file formats (e.g. plain text or CSV text).
tm provides easy access to preprocessing and manipulation mechanisms, like
*) whitespace removal,
*) stemming, or
*) conversion between file formats (e.g., Reuters21578 to plain
Further a generic filter architecture is available in order to
*) filter documents for certain criteria,
*) or perform fulltext search.
The package supports the export from document collections to
term-document matrices as frequently used in the text mining
literature. This allows the straight-forward integration of existing
methods for classification, clustering, visualizations, etc.
The package is designed in a modular way to enable easy integration of
new file formats, parsers, transformations and filter operations.
More information about the R-packages