[R] svm

Thu Jan 7 00:44:13 CET 2010

Hi Amy,

On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <amy_4_5_84 at hotmail.com> wrote:
> Hi Steve,
>
> Thank you very much for your reply.
>
> I’m trying to do something systematic/general in the program so that I can
> try different datasets without changing much in the program (without knowing
> the name of the class label that has different name from dataset to
> another…)
>
> Could you please tell me your opinion about this code:-
>
> library(e1071)
>
> mydata<-read.delim("the_whole_dataset.txt")
>
> class_label <- names(mydata)[1]                        # I’ll always put the
> class label in the first column.
>
> myformula <- formula(paste(class_label,"~ ."))
>
> x <- subset(mydata, select = - mydata[, 1])
>
> mymodel<-(svm(myformula, x, cross=3))
>
> summary(model)
>
> ################

Since you're not doing anything funky with the formula, a preference
of mine is to just skip this way of calling SVM and go "straight" to
the svm(x,y,...) method:

R> mydata <- as.matrix(read.delim("the_whole_dataset.txt"))
R> train.x <- mydata[,-1]
R> train.y <- mydata[,1]

R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification")
## or
R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression")

As an aside, I also like to be explicit about the type="" parameter to
tell what I want my SVM to do (regression or classification). If it's
not specified, the SVM picks which one to do based on whether or not
your y vector is a vector of factors (does classification), or not
(does regression)

> Do I have to the same steps with testingset? i.e. the testing set must not
> contain the label too? But contains the same structure as the training set?
> Is it correct?

I guess you'll want to report your accuracy/MSE/something on your
model for your testing set? Just load the data in the same way then
use `predict` to calculate the metric your after. You'll have to have
the labels for your data to do that, though, eg:

testdata <- as.matrix(read.delim('testdata.txt'))
test.x <- testdata[,-1]
test.y <- testdata[,1]
preds <- predict(mymodel, test.x)

Let's assume you're doing classification, so let's report the accuracy:

acc <- sum(preds == test.y) / length(test.y)

Does that help?
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact