[R] linear classifiers with sparse matrices

Jeff Hansen dscheffy at gmail.com
Thu Oct 6 23:31:02 CEST 2011


I've been trying to get some linear classifiers (LiblineaR, kernlab,
e1071) to work with a sparse matrix of feature data.  In the case of
LiblineaR and kernlab, it seems I have to coerce my data into a dense
matrix in order to train a model.  I've done a number of searches,
read through the manuals and vignettes, but I can't seem to see how to
use either of these packages with sparse matrices.  I've tried using
both csr from SparseM and sparseMatrix from the Matrix library.  You
can see a simple example recreating my results below.

Does anybody know if there's a trick to get this to work without
coercing the data into a dense matrix?

I'm currently playing with the KDDCUP 2010 datasets.  I've written a
simple script to create hash kernel feature vectors for each of the
rows of training data.  Right now I haven't added many features into
the hash vectors.  For simplicity, I'm just creating a string token
for each feature, then hashing it and taking that hash mod 10007 and
10009 (so two buckets for each feature with a low likelihood of two
features colliding on both buckets).  10009 columns may seem like
overkill, but I figured if it was a sparse matrix the number of
columns really wouldn't matter that much.  Right now I'm also only
playing with 99999 rows of input.  When ever I make the mistake of
doing something which unintentionally coerces the sparse matrix into a
dense one, I end up eating up all my RAM, going to swap, and spending
the next 5 minutes trying to kill my session...  So I'm looking for
something that scales relatively well without taking up too large a
memory footprint to run.

Thanks!
Jeff

See below for an example that recreates what my basic attempts at
using sparse matrices


> L1=rep(0:1,5)
> M1=sparseMatrix(i=c(1:5*2,1:5*2),j=c(rep(1,5),rep(10,5)),x=1)
> L1=rep(0:1,5)
> SM1=sparseMatrix(i=c(1:5*2,1:5*2),j=c(rep(1,5),rep(10,5)),x=1)
> DM=as.matrix(SM1)
> SM2=as.matrix.csr(DM)
> as.matrix(SM2)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    0    0    0    0     0
[2,]    1    0    0    0    0    0    0    0    0     1
[3,]    0    0    0    0    0    0    0    0    0     0
[4,]    1    0    0    0    0    0    0    0    0     1
[5,]    0    0    0    0    0    0    0    0    0     0
[6,]    1    0    0    0    0    0    0    0    0     1
[7,]    0    0    0    0    0    0    0    0    0     0
[8,]    1    0    0    0    0    0    0    0    0     1
[9,]    0    0    0    0    0    0    0    0    0     0
[10,]    1    0    0    0    0    0    0    0    0     1
> L1
[1] 0 1 0 1 0 1 0 1 0 1
> model = LiblineaR(DM,L1)
> predict(model,DM)
$predictions
 [1] 0 1 0 1 0 1 0 1 0 1

> model = LiblineaR(SM1,L1)
Error in t.default(data) : argument is not a matrix
> model = LiblineaR(SM1,L1)
Error in t.default(data) : argument is not a matrix
 Setting default kernel parameters
> predict(model,DM)
      [,1]
 [1,]  0.1
 [2,]  0.9
 [3,]  0.1
 [4,]  0.9
 [5,]  0.1
 [6,]  0.9
 [7,]  0.1
 [8,]  0.9
 [9,]  0.1
[10,]  0.9
> model = ksvm(SM1,L1,scale=FALSE,kernel="vanilladot")
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "ksvm", for
signature "dgCMatrix"
> model = ksvm(SM2,L1,scale=FALSE,kernel="vanilladot")
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "ksvm", for
signature "matrix.csr"
>



More information about the R-help mailing list