[R] Weka on command line c.f. using RWeka
Patrick Connolly
p_connolly at slingshot.co.nz
Mon Nov 12 08:53:50 CET 2012
Running Weka's command line with calls to system(), like this
> system("java weka.classifiers.bayes.NaiveBayes -K -t HWlrTrain.arff -o")
=== Confusion Matrix ===
a b <-- classified as
3518 597 | a = NoSpray
644 926 | b = Spray
=== Stratified cross-validation ===
=== Confusion Matrix ===
a b <-- classified as
3512 603 | a = NoSpray
653 917 | b = Spray
So far, no surprises except that maybe I might have expected a few
more misclassifications in the cross-validation.
However,
If I use the same data in R
> train.df <- read.arff("HWlrTrain.arff")
using RWeka, like this:
NB <- make_Weka_classifier("weka/classifiers/bayes/NaiveBayes")
wNB <- NB(decision ~ ., data = train.df,
+ control = Weka_control(K = TRUE))
> summary(wNB)
=== Summary ===
Correctly Classified Instances 4437 78.0475 %
Incorrectly Classified Instances 1248 21.9525 %
Kappa statistic 0.4446
Mean absolute error 0.2679
Root mean squared error 0.3924
Relative absolute error 67.0055 %
Root relative squared error 87.7545 %
Coverage of cases (0.95 level) 97.9244 %
Mean rel. region size (0.95 level) 83.0519 %
Total Number of Instances 5685
=== Confusion Matrix ===
a b <-- classified as
3520 595 | a = NoSpray
653 917 | b = Spray
The resulting confusion matrix is different from both the training and
the cross-validation matrices from Weka's command line.
Somewhat ironically, if I use the model to predict on test data, like
this, predict(wNB, test.df)
I do get exactly the same as I would from the Weka CLI.
Maybe the difference isn't important, but I would have expected the
two approaches would have done exactly the same thing.
Any possible explanations?
--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Average minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Eleanor Roosevelt
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
More information about the R-help
mailing list