[R] RWeka cross-validation and Weka_control Parametrization

Tue Aug 14 10:54:54 CEST 2007

> On Wed, 01 Aug 2007 10:52:02 +0200, Bjoern wrote:

> Hello,

>          I have two questions concerning the RWeka package:

>          1.) First question:
>          How can one perform a cross validation, -say 10fold- for a given 
> data set and given model ?

>          2.) Second question
>          What is the correct syntax for the parametrization of e.g. Kernel 
> classifiers interface
>    m1 <- SMO(Species ~ ., data = iris, control = 
>              Weka_control(K="weka.classifiers.functions.supportVector.RBFKernel",G=0.1))
>    m2 <- SMO(Species ~ ., data = iris, control = 
>              Weka_control(K="weka.classifiers.functions.supportVector.RBFKernel",G=1.0))

>> m1
>          SMO

>          Kernel used:
>          RBF kernel: K(x,y) = e^-(0.01* <x-y,x-y>^2)

>          ## should be: RBF kernel: K(x,y) = e^-(0.1* <x-y,x-y>^2)

> etc.

The answer for question 2 is surprisingly simple, but nevertheless took
me about half an hour to find:

  m2 <- SMO(Species ~ ., data = iris,
  control = Weka_control(K = "weka.classifiers.functions.supportVector.RBFKernel -G 2"))

gives

R> m2
SMO

Kernel used:
  RBF kernel: K(x,y) = e^-(2.0* <x-y,x-y>^2)

[Using Weka_control(K = ..., G = ...) passes the G option to SMO but not
RBFKernel.  The docs for SMO() say

 -K <classname and parameters>
  The Kernel to use.
  (default: weka.classifiers.functions.supportVector.PolyKernel)

and one needs to remember Weka's command line style interface to realize
that this deparses into putting everything into a string for the K
option.]

This is of course not quite what R users would expect, and we'll try to
improve the Weka control mechanism so that specifying (Weka class)
options which require additional parameters becomes more convenient.

Best
-k