[R] [e1071] Inconsistent results when using matrix.csr for svm() - possibly scaling problem

Eva May evamay at mail.com
Thu Jun 25 16:18:58 CEST 2009


Dear all,

I'm training an SVM with default settings on a matrix csr (SparseM 
package). I realized that if I train
the SVM with the (hopefully) equivalent matrix (Matrix package) 
representation, the returned models and predictions
sometimes differ. I expected both representations of the same data 
to lead to the same results though.
It could be that it is a scaling problem, because unscaled  results are equal(see below).

I'm using the SparseM_0.80 and e1071_1.5-19.

This is what I do (details can be found below):

> #run model on csr matrixes
> model <- svm(matrixcsrTraining,classfactorTraining)
> predict(model, matrixcsrTest)
        1
0.944625

> #run model on matrix representation (Matrix) of the csr matrix from above
> model <- svm(as.matrix(matrixcsrTraining),classfactorTraining)
> predict(model, as.matrix(matrixcsrTest))
         1
0.8325838

Possibly this is a scaling problem with sparse matrices, because 
results are equal,
if scaling is disabled.

> #run model on csr matrixes without scaling
> model <- svm(matrixcsrTraining,classfactorTraining, scale = FALSE)
> predict(model, matrixcsrTest)
        1
0.944625
> #run model on normal matrixes without scaling
> model <- svm(as.matrix(matrixcsrTraining),classfactorTraining, scale = FALSE)
> predict(model, as.matrix(matrixcsrTest))
        1
0.944625

Is scaling different for both formats? Or is there no scaling for SparseM?

Thank you very much for your help,

Eva
CS bachelor student

---------------------------------------------------------------------------
---------------------------------------------------------------------------
Details:

Code below, files attached

#read in data
coordinates <- read.csv('vector.data',head=TRUE)
j <- subset(coordinates,select=c(j))
ja <- as.integer(j[1:dim(j)[1],])
i<-subset(coordinates,select=c(i))
ia <- as.integer(i[1:dim(i)[1],])
classes <- read.csv("classes.data",head=TRUE)
classfactorTraining <- classes[1:(max(ia)),]

#build matrixcoo first, then matrixcsr for training
dim <- as.integer(c(max(ia),max(ja)))
matrixcoo = new("matrix.coo",ra=rep(1,dim(j)[1]),ja=ja,ia=ia,dim=dim)
matrixcsrTraining = as.matrix.csr(matrixcoo)

#build a simple matrix for testing
matrixcoo = 
new("matrix.coo",ra=rep(1,1),ja=as.integer(c(13)),ia=as.integer(c(1)),dim=as.integer(c(1,max(ja))))
matrixcsrTest = as.matrix.csr(matrixcoo)

#run model on csr matrixes
model <- svm(matrixcsrTraining,classfactorTraining, scale = FALSE)
predict(model, matrixcsrTest)
#run model on normal matrixes
model <- svm(as.matrix(matrixcsrTraining),classfactorTraining, scale = FALSE)
predict(model, as.matrix(matrixcsrTest))

------------------------------------------------------------------------------
Masked Methods:

	The following object(s) are masked from package:stats :

	 model.response


	The following object(s) are masked from package:base :

	 backsolve,
	 chol


-----------------------------------------------------------------
Session info:

> sessionInfo()
R version 2.9.0 (2009-04-17)
x86_64-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] methods   stats     graphics  grDevices utils     datasets  base

other attached packages:
[1] SparseM_0.80 e1071_1.5-19 class_7.2-47


-- 
Be Yourself @ mail.com!
Choose From 200+ Email Addresses
Get a Free Account at www.mail.com



More information about the R-help mailing list