[R] Extracting certain text using tm package
vioravis
vioravis at gmail.com
Mon Jun 27 08:47:35 CEST 2011
I have used "tm" package to import a set of text documents using the
following command:
text <- Corpus(DirSource("."),readerControl = list(language ="ansi"))
I would like to extract only a certain portion of the text in each document
using certain keywords. For example, I would like to include all the text
between key words <Start Text> and <End Text>. All the remaining text should
be discarded. Is there anyway to accomplish this in 'tm' package???
Also, is there a quick way to remove all the HTML tags from the text???
Thank you.
Ravi
--
View this message in context: http://r.789695.n4.nabble.com/Extracting-certain-text-using-tm-package-tp3627063p3627063.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list