[R] problem in testing data with e1071 package (SVM Multiclass)
stefania.pecore at univ-ubs.fr
stefania.pecore at univ-ubs.fr
Sat Sep 2 17:26:19 CEST 2017
Hello all,
this is the first time I'm using R and e1071 package and SVM multiclass
(and I'm not a statistician)! I'm very confused, then. The goal is: I
have a sentence with sunny; it will be classified as "yes" sentence; I
have a sentence with cloud, it will be classified as "maybe"; I have a
sentence with rainy il will be classified as "no".
The true goal is to do some text classification to apply then for my
research.
I have two files:
* train.csv: a file where there are two columns/Variables one is the
data, the other is the label
Example:
|V1 V2 1sunny yes 2sunny sunny yes 3sunny rainy sunny yes 4sunny cloud
sunny yes 5rainy no6rainy rainy no7rainy sunny rainy no8rainy cloud
rainy no9cloud maybe 10cloud cloud maybe 11cloud rainy cloud maybe
12cloud sunny cloud maybe|
* test.csv: in this file there are the new data to be classified and
it is in one column/variable.
Example:
|V1 1sunny 2rainy 3hello 4cloud 5a 6b 7cloud 8d 9e 10f 11g 12hello|
Following the examples from the iris dataset
(https://cran.r-project.org/web/packages/e1071/e1071.pdfandhttp://rischanlab.github.io/SVM.html)
I created my model and then test the training data in this way:
|>library(e1071)
>train <-read.csv(file="./train.csv",sep =";",header =FALSE)
>test <-read.csv(file="./test.csv",sep =";",header =FALSE)>attach(train)
>x <-subset(train,select=-V2)
>y <-V2 >model <-svm(V2 ~.,data =train,probability=TRUE)
>summary(model)
Call:svm(formula =V2 ~.,data =train,probability
=TRUE)Parameters:SVM-Type:C-classification SVM-Kernel:radial
cost:1gamma:0.08333333Numberof SupportVectors:12(444)Numberof
Classes:3Levels:maybe noyes
>pred <-predict(model,x)
>system.time(pred <-predict(model,x))
user system elapsed 000
>table(pred,y)y
|
|pred maybe noyes maybe 400no040yes 004>pred 123456789101112yes yes yes
yes nonononomaybe maybe maybe maybe Levels:maybe noyes|
||
I think it's ok until now. Now the question is: what about the test
data? I didn't find anything for the test data. Then, I thought that
maybe I should test the model with the test data. And I did this:
| >test V1 1sunny 2rainy 3hello 4cloud 5a 6b 7cloud 8d 9e 10f 11g 12hello
>z <-subset(test,select=V1)>pred
<-predict(model,z)Errorinpredict.svm(model,z):test data does notmatch
model !|
What is wrong here? Can you please explain me how can I test new data
using the old train model? For two days I asked everywhere and saw many
websites but didn't find a solution and it's very complicated because I
think that the logic behind the code is ok, but something is missin in
my way to express it using R.
Thank you for your help
||
[[alternative HTML version deleted]]
More information about the R-help
mailing list