[R] Problem with Snowball & RWeka
flobede
flobede at hotmail.com
Thu Jun 2 20:49:51 CEST 2011
Greetings to all,
I have a similar issue with Snowball.
I am runing R version 2.12.1 (2010-12-16) on windows 7
Here is my script :
----
library(tm)
custom.xml <- system.file("texts", "custom.xml", package = "tm")
print(readLines(custom.xml), quote = FALSE)
myXMLReader <- readXML(
spec = list(
Language = list("node", "/document/language"),
DateTimeStamp = list("node", "/document/date"),
Origin = list("node", "/document/source"),
Description = list("node", "/document/subject"),
Type = list("node", "/document/country"),
Heading = list("node", "/document/title"),
Content = list("node", "/document/contenu"),
Author = list("node", "/document/author")),
doc = PlainTextDocument())
mySource <- function(x, encoding = "UTF-8")
XMLSource(x, function(tree) XML::xmlRoot(tree)$children, myXMLReader,
encoding)
corpusmf <- Corpus(mySource(custom.xml))
meta(corpusmf[[1]])
meta(corpusmf[[2]])
corpusmf <- tm_map(corpusmf, stripWhitespace)
corpusmf <- tm_map(corpusmf, removeNumbers)
corpusmf <- tm_map(corpusmf, removePunctuation)
corpusmf <- tm_map(corpusmf,stemDocument)
matrix <- TermDocumentMatrix(corpusmf,control=list(weighting =weightBin ))
print(matrix)
-----
stemDocument returns an error message :
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
I tried to invoke library(Snowball) before, but it's the same.
I found a clue on Weka website
http://weka.wikispaces.com/The+snowball+stemmers+don%27t+work,+what+am+I+doing+wrong%3F
but I don't understand what I should do with this archives
I would be grateful if someone could help on this;
Kind regards,
--
View this message in context: http://r.789695.n4.nabble.com/Problem-with-Snowball-RWeka-tp3402126p3569089.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list