[R] tm: Read a single text file into a corpus as single document?

Alexander James Rickett ack.vandal at gmail.com
Tue Jul 19 10:11:46 CEST 2011


Hello everyone,

I'm doing some JGR (a gui frontend for R) development, specifically adding functionality from tm.  In order to enable users to select some text files from a file dialog, and turn them into a corpus, I need to be able to generate a corpus using a *SINGLE* text file as a single document, and to append a new document to an existing corpora.  I know if I could read files into single character vectors I'd be in business, but I can't find how to do this either.  This seems like a no-brainer, so I'm at my wits' end.

Here's pseudo code of what I'd like to be able to do:

##########################################
> corp1doc <- Corpus(singleTextDocSource("path/to/doc")) #read in 1 text doc as a 1-document corpus
> corp1doc
	A corpus with 1 text document

> corp1doc[[2]] <- AnotherSingleTextDoc("path/to/doc") #append a second document to the same corpus
> corp1doc
	A corpus with 2 text documents
##########################################

I can almost do this with dirSource, by setting pattern='filename', but this requires me to also to separate the path to the enclosing directory, which shouldn't be necessary.  

Thanks for taking a look!



More information about the R-help mailing list