[R] word stemming for corpus linguistics
Andy Wolfe
phaedrusv at gmail.com
Tue Jul 26 09:10:07 CEST 2016
Hi list
On a piece of work I'm doing in corpus linguistics, using a combo of
texts by Gries "Quantitative Corpus Linguistics with R: A Practical
Introduction" and Jockers "Text Analysis with R for Students of
Literature", which are both really excellent by the way, I want to stem
or lemmatize the words so that, for e.g., 'facilitating', 'facilitated',
and 'facilitates' all become 'facilit'.
In text mining, using a combination of the packages 'tm' and 'SnowballC'
this is feasible, but then I am finding that working with the DTM
(document term matrix) becomes difficult for when I want to do
concordance (or key word in context) analysis.
So, two questions:
(1) is there a package for R version 3.3.1 that can work with corpus
linguistics? and/ or
(2) is there a way of doing concordance analysis using the tm package as
part of the whole text mining process?
I appreciate any help. Thanks.
Andy
[[alternative HTML version deleted]]
More information about the R-help
mailing list