Hi Steve,

 

Could you please help me in this point?:
 
I use SVM of R and I’m trying some datasets from UCI but when I compare the results of my program( that does not do anything more than calling SVM) with the RMSE of SVM in any other paper, I found a big gap between them.
 
For example, this is the rmse of svm of my program for the dataset bodyfat: 2.64561
And this is the RMSE of a paper 0.0204.
 
Could you please tell me how I can reduce this gap in the performance of SVM?
 
Cheers,
Amy
 

> Date: Sat, 9 Jan 2010 15:48:49 -0500
> Subject: Re: [R] svm
> From: mailinglist.honeypot@gmail.com
> To: amy_4_5_84@hotmail.com
> CC: r-help@r-project.org
> 
> Hi,
> 
> On Fri, Jan 8, 2010 at 11:57 AM, Amy Hessen <amy_4_5_84@hotmail.com> wrote:
> > Hi Steve,
> >
> > Thank you very much for your reply. Your code is more readable and obvious than mine…
> 
> No Problem.
> 
> > Could you please help me in these questions?:
> >
> > 1) “Formula” is an alternative to “y” parameter in SVM. is it correct?
> 
> No, that's not correct.
> 
> There are two svm functions, one that takes a "formula" object
> (svm.formula), and one that takes an x matrix, and a y vector
> (svm.default). The svm.formula function is called when the first
> argument in your "svm(..)" call is a formula object. This function
> simply parses the formula and manipulates your data object into an x
> matrix and y vector, then calls the svm.default function with those
> params ... I usually prefer to just skip the formula and provide the x
> and y objects directly.
> 
> Load the e1071 library and look at the source code:
> 
> R> library(e1071)
> R> e1071:::svm.formula
> 
> You'll see what I mean.
> 
> > 2) I forgot to remove the “class label” from the dataset besides I gave the
> > program the class label in formula parameter but the program works! Could
> > you please clarify this point to me?
> 
> The author of the e1071 package did you a favor. The predict.svm
> function checks to see if your svm object was built using the formula
> interface .. if so, it looks for you label column in the data you are
> trying to predict on and ignores it.
> 
> Look at the function's source code (eg, type e1071:::predict.svm at
> the R prompt), and look for the call to the delete.response function
> ... you can also look at the help in ?delete.response.
> 
> -steve
> 
> 
> >> Date: Wed, 6 Jan 2010 18:44:13 -0500
> >> Subject: Re: [R] svm
> >> From: mailinglist.honeypot@gmail.com
> >> To: amy_4_5_84@hotmail.com
> >> CC: r-help@r-project.org
> >>
> >> Hi Amy,
> >>
> >> On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <amy_4_5_84@hotmail.com> wrote:
> >> > Hi Steve,
> >> >
> >> > Thank you very much for your reply.
> >> >
> >> > I’m trying to do something systematic/general in the program so that I
> >> > can
> >> > try different datasets without changing much in the program (without
> >> > knowing
> >> > the name of the class label that has different name from dataset to
> >> > another…)
> >> >
> >> > Could you please tell me your opinion about this code:-
> >> >
> >> > library(e1071)
> >> >
> >> > mydata<-read.delim("the_whole_dataset.txt")
> >> >
> >> > class_label <- names(mydata)[1]                        # I’ll always put
> >> > the
> >> > class label in the first column.
> >> >
> >> > myformula <- formula(paste(class_label,"~ ."))
> >> >
> >> > x <- subset(mydata, select = - mydata[, 1])
> >> >
> >> > mymodel<-(svm(myformula, x, cross=3))
> >> >
> >> > summary(model)
> >> >
> >> > ################
> >>
> >> Since you're not doing anything funky with the formula, a preference
> >> of mine is to just skip this way of calling SVM and go "straight" to
> >> the svm(x,y,...) method:
> >>
> >> R> mydata <- as.matrix(read.delim("the_whole_dataset.txt"))
> >> R> train.x <- mydata[,-1]
> >> R> train.y <- mydata[,1]
> >>
> >> R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification")
> >> ## or
> >> R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression")
> >>
> >> As an aside, I also like to be explicit about the type="" parameter to
> >> tell what I want my SVM to do (regression or classification). If it's
> >> not specified, the SVM picks which one to do based on whether or not
> >> your y vector is a vector of factors (does classification), or not
> >> (does regression)
> >>
> >> > Do I have to the same steps with testingset? i.e. the testing set must
> >> > not
> >> > contain the label too? But contains the same structure as the training
> >> > set?
> >> > Is it correct?
> >>
> >> I guess you'll want to report your accuracy/MSE/something on your
> >> model for your testing set? Just load the data in the same way then
> >> use `predict` to calculate the metric your after. You'll have to have
> >> the labels for your data to do that, though, eg:
> >>
> >> testdata <- as.matrix(read.delim('testdata.txt'))
> >> test.x <- testdata[,-1]
> >> test.y <- testdata[,1]
> >> preds <- predict(mymodel, test.x)
> >>
> >> Let's assume you're doing classification, so let's report the accuracy:
> >>
> >> acc <- sum(preds == test.y) / length(test.y)
> >>
> >> Does that help?
> >> -steve
> >>
> >> --
> >> Steve Lianoglou
> >> Graduate Student: Computational Systems Biology
> >> | Memorial Sloan-Kettering Cancer Center
> >> | Weill Medical College of Cornell University
> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
> >
> > ________________________________
> > Sell your old one fast! Time for a new car?
> 
> 
> 
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
 		 	   		  
_________________________________________________________________
View photos of singles in your area! Browse profiles for FREE

	[[alternative HTML version deleted]]