[R] Problem with Snowball & RWeka

A N an02167 at gmail.com
Sat Jun 4 22:11:07 CEST 2011


I too have this problem. Everything worked fine last year, but after
updating R and packages I can no longer do word stemming.
Unfortunately, I didn't save the old binaries, otherwise I would just
revert back.

Hoping someone finds a solution for R on Windows. Thanks!
There is a potential solution for R on Mac OS from Kurt Hornik copied
below, but I cannot get this to work on Windows.

Here's the code I'm running:
     #1) Using package Snowball
         library(Snowball)
         source <- readLines(system.file("words",
"porter","voc.txt",package = "Snowball"))
         result <- SnowballStemmer(source)
     #2) Using package tm
         library(tm)
         data("crude")
         stemDocument(crude[[1]])

In both instances I got a Java error "Could not initialize the
GenericPropertiesCreator. This exception was produced:
java.lang.NullPointerException". After receiving this error once in
the session, no further error messages are generated. However,
SnowballStemmer() and stemDocument() return the original unstemmed
text.

Possible Solution:
For those on Mac OS, Kurt Hornik wrote...
     These issues seem to be specific to Mac OS X.  Recent versions of Weka
     have added a package management system not unlike R's, to the effect
     that now when external packages (or the Snowball jar) is loaded their
     KnowledgeFlow GUI is started, which in turn requires AWT---and from what
     I understand, this does not work on Mac OS X.

     Short term, you should be able to Sys.setenv("NOAWT", "true").

     More long term, the Weka maintainers have patched their upstream code so
     that it is possible to turn off the dynamic class discovery altogether,
     but I have not found the time to test this ...

I realize this solution was for Mac OS, but not knowing anything about
rJava I tried this on Windows anyways resulting in "Error in
Sys.setenv("NOAWT", "true") : all arguments must be named"

Here's my session info.
          R version 2.13.0 Patched (2011-04-21 r55576)
          Platform: i386-pc-mingw32/i386 (32-bit) (Windows Vista)

          locale:
          [1] LC_COLLATE=English_United States.1252
          [2] LC_CTYPE=English_United States.1252
          [3] LC_MONETARY=English_United States.1252
          [4] LC_NUMERIC=C
          [5] LC_TIME=English_United States.1252

          attached base packages:
          [1] stats     graphics  grDevices datasets  utils
methods   base

          other attached packages:
          [1] Snowball_0.0-7 tm_0.5-6       rcom_2.2-3.1   rscproxy_1.3-1

          loaded via a namespace (and not attached):
          [1] grid_2.13.0       rJava_0.9-0 (same error with multiple
older versions) RWeka_0.4-7       RWekajars_3.7.3-1
          [5] slam_0.1-22       tools_2.13.0



More information about the R-help mailing list