[R] Problem with Snowball & RWeka
    flobede 
    flobede at hotmail.com
       
    Thu Jun  2 20:49:51 CEST 2011
    
    
  
Greetings to all,
I have a similar issue with Snowball.
I am runing R version 2.12.1 (2010-12-16) on windows 7
Here is my script : 
----
library(tm)
custom.xml  <-  system.file("texts",  "custom.xml",  package  =  "tm")
print(readLines(custom.xml),  quote  =  FALSE)
myXMLReader <- readXML(
  spec = list(
    Language = list("node", "/document/language"),
    DateTimeStamp = list("node", "/document/date"),
    Origin = list("node", "/document/source"),
    Description = list("node", "/document/subject"),
    Type = list("node", "/document/country"),
	  Heading = list("node", "/document/title"),
    Content = list("node", "/document/contenu"),
    Author = list("node", "/document/author")),
doc = PlainTextDocument())
mySource  <-  function(x,  encoding  =  "UTF-8")
  XMLSource(x,  function(tree)  XML::xmlRoot(tree)$children,  myXMLReader, 
encoding)
corpusmf  <-  Corpus(mySource(custom.xml))
meta(corpusmf[[1]])
meta(corpusmf[[2]])
corpusmf <- tm_map(corpusmf, stripWhitespace)
corpusmf <- tm_map(corpusmf, removeNumbers)
corpusmf <- tm_map(corpusmf, removePunctuation)
corpusmf <- tm_map(corpusmf,stemDocument)
matrix <- TermDocumentMatrix(corpusmf,control=list(weighting =weightBin ))
print(matrix)
 
-----
stemDocument returns an error message :
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
I tried to invoke library(Snowball) before, but it's the same.
I found a clue on Weka website
http://weka.wikispaces.com/The+snowball+stemmers+don%27t+work,+what+am+I+doing+wrong%3F
but I don't understand what I should do with this archives
I would be grateful if someone could help on this;
Kind regards,
--
View this message in context: http://r.789695.n4.nabble.com/Problem-with-Snowball-RWeka-tp3402126p3569089.html
Sent from the R help mailing list archive at Nabble.com.
    
    
More information about the R-help
mailing list