Hi Steve, Could you please help me in this point?: I use SVM of R and I’m trying some datasets from UCI but when I compare the results of my program( that does not do anything more than calling SVM) with the RMSE of SVM in any other paper, I found a big gap between them. For example, this is the rmse of svm of my program for the dataset bodyfat: 2.64561 And this is the RMSE of a paper 0.0204. Could you please tell me how I can reduce this gap in the performance of SVM? Cheers, Amy > Date: Sat, 9 Jan 2010 15:48:49 -0500 > Subject: Re: [R] svm > From: mailinglist.honeypot@gmail.com > To: amy_4_5_84@hotmail.com > CC: r-help@r-project.org > > Hi, > > On Fri, Jan 8, 2010 at 11:57 AM, Amy Hessen wrote: > > Hi Steve, > > > > Thank you very much for your reply. Your code is more readable and obvious than mine… > > No Problem. > > > Could you please help me in these questions?: > > > > 1) “Formula” is an alternative to “y” parameter in SVM. is it correct? > > No, that's not correct. > > There are two svm functions, one that takes a "formula" object > (svm.formula), and one that takes an x matrix, and a y vector > (svm.default). The svm.formula function is called when the first > argument in your "svm(..)" call is a formula object. This function > simply parses the formula and manipulates your data object into an x > matrix and y vector, then calls the svm.default function with those > params ... I usually prefer to just skip the formula and provide the x > and y objects directly. > > Load the e1071 library and look at the source code: > > R> library(e1071) > R> e1071:::svm.formula > > You'll see what I mean. > > > 2) I forgot to remove the “class label” from the dataset besides I gave the > > program the class label in formula parameter but the program works! Could > > you please clarify this point to me? > > The author of the e1071 package did you a favor. The predict.svm > function checks to see if your svm object was built using the formula > interface .. if so, it looks for you label column in the data you are > trying to predict on and ignores it. > > Look at the function's source code (eg, type e1071:::predict.svm at > the R prompt), and look for the call to the delete.response function > ... you can also look at the help in ?delete.response. > > -steve > > > >> Date: Wed, 6 Jan 2010 18:44:13 -0500 > >> Subject: Re: [R] svm > >> From: mailinglist.honeypot@gmail.com > >> To: amy_4_5_84@hotmail.com > >> CC: r-help@r-project.org > >> > >> Hi Amy, > >> > >> On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen wrote: > >> > Hi Steve, > >> > > >> > Thank you very much for your reply. > >> > > >> > I’m trying to do something systematic/general in the program so that I > >> > can > >> > try different datasets without changing much in the program (without > >> > knowing > >> > the name of the class label that has different name from dataset to > >> > another…) > >> > > >> > Could you please tell me your opinion about this code:- > >> > > >> > library(e1071) > >> > > >> > mydata<-read.delim("the_whole_dataset.txt") > >> > > >> > class_label <- names(mydata)[1] # I’ll always put > >> > the > >> > class label in the first column. > >> > > >> > myformula <- formula(paste(class_label,"~ .")) > >> > > >> > x <- subset(mydata, select = - mydata[, 1]) > >> > > >> > mymodel<-(svm(myformula, x, cross=3)) > >> > > >> > summary(model) > >> > > >> > ################ > >> > >> Since you're not doing anything funky with the formula, a preference > >> of mine is to just skip this way of calling SVM and go "straight" to > >> the svm(x,y,...) method: > >> > >> R> mydata <- as.matrix(read.delim("the_whole_dataset.txt")) > >> R> train.x <- mydata[,-1] > >> R> train.y <- mydata[,1] > >> > >> R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification") > >> ## or > >> R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression") > >> > >> As an aside, I also like to be explicit about the type="" parameter to > >> tell what I want my SVM to do (regression or classification). If it's > >> not specified, the SVM picks which one to do based on whether or not > >> your y vector is a vector of factors (does classification), or not > >> (does regression) > >> > >> > Do I have to the same steps with testingset? i.e. the testing set must > >> > not > >> > contain the label too? But contains the same structure as the training > >> > set? > >> > Is it correct? > >> > >> I guess you'll want to report your accuracy/MSE/something on your > >> model for your testing set? Just load the data in the same way then > >> use `predict` to calculate the metric your after. You'll have to have > >> the labels for your data to do that, though, eg: > >> > >> testdata <- as.matrix(read.delim('testdata.txt')) > >> test.x <- testdata[,-1] > >> test.y <- testdata[,1] > >> preds <- predict(mymodel, test.x) > >> > >> Let's assume you're doing classification, so let's report the accuracy: > >> > >> acc <- sum(preds == test.y) / length(test.y) > >> > >> Does that help? > >> -steve > >> > >> -- > >> Steve Lianoglou > >> Graduate Student: Computational Systems Biology > >> | Memorial Sloan-Kettering Cancer Center > >> | Weill Medical College of Cornell University > >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > ________________________________ > > Sell your old one fast! Time for a new car? > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact _________________________________________________________________ View photos of singles in your area! Browse profiles for FREE [[alternative HTML version deleted]]