[R] sparse vectors

Martin Maechler maechler at stat.math.ethz.ch
Tue Sep 8 16:42:09 CEST 2009


>>>>> "Robin" == Robin Hankin <rksh1 at cam.ac.uk>
>>>>>     on Tue, 08 Sep 2009 14:58:49 +0100 writes:

    Robin> Hi guys
    Robin> thanks for this, it works fine, but I'm not sure the Matrix package does 
    Robin> what I want:

    >> a = sparseMatrix(i=c(20, 30, 1000000000), j=rep(1, 3), x=c(2.2, 3.3, 4.4))
    Robin> Error in asMethod(object) :
    Robin> Cholmod error 'out of memory' at file:../Core/cholmod_memory.c, line 148

    Robin> Surely an efficient storage mechanism would need only six pieces of 
    Robin> information?

sure.
sparseMatrix()  is designed to produces  "column-compressed"
sparse matrices ("CsparseMatrix"), as these are optimal in
some sense for further matrix operations notably in the CHOLMOD
C library to which the Matrix package is interfaced.

Indeed, you have triggered a problem in that CHOLMOD code,
which needs an inordinate amount of memory when it really should
not.

Alternatively, for your case, I'd recommend to use the
(older, slightly less flexible) constructor

spMatrix()  [which produces a triplet ("Tsparse..") sparse
	     matrix representation]
which works without that funny memory glitch.

Note BTW, that   'i=1000000000'  pretty close .Machine$integer.max
and we currently require the indices to be "integer" (in the
sense of R, i.e., 32-bit).

Alternatively, I had introduced the "sparseVector" class into
the Matrix package a while ago, which *does* allow "numeric"
indices ...
.. but at the moment does not have too many methods defined,
notably not arithmetic.
{ The reason I introduced the class was actually to allow 
  "reshaping" sparse matrices, i.e., to use
     dim(<sparseMatrix>) <- c(n1, n2)
}.


    Robin> I've been pondering the solution that Henrique suggested, that uses
    Robin> merge().  This seems to be fine, although it might be possible
    Robin> to squeeze some efficiency gains by using the fact that
    Robin> the index vector is always sorted, which migh save some
    Robin> searching time.

the sparseMatrix and sparseVector classes in Matrix do always
keep the indices sorted,
and actually your use case  would motivate me quite a bit to add
more (arithmetic) capabilities to the "sparseVector" classes.

Martin


    Robin> Any thoughts anyone?



    Robin> best wishes
    Robin> Robin




    Robin> Benilton Carvalho wrote:
    >> library(Matrix)
    >> a = sparseMatrix(i=c(20, 30, 100000000), j=rep(1, 3), x=c(2.2, 3.3, 4.4))
    >> b = sparseMatrix(i=c(3, 30), j=rep(1, 2), x=c(0.1, 0.1), dims=dim(a))
    >> theSum = a+b
    >> summary(theSum)
    >> 
    >> 
    >> hth,
    >> b
    >> 
    >> On Sep 8, 2009, at 10:19 AM, Henrique Dallazuanna wrote:
    >> 
    >>> Try this:
    >>> 
    >>> abMerge <- merge(a, b, by = 'index', all = TRUE)
    >>> list(index = abMerge$index, val = rowSums(abMerge[,2:3], na.rm = TRUE))
    >>> 
    >>> On Tue, Sep 8, 2009 at 10:06 AM, Robin Hankin <rksh1 at cam.ac.uk> wrote:
    >>> 
    >>>> Hi
    >>>> 
    >>>> I deal with long vectors almost all of whose elements are zero.
    >>>> Typically, the length will be ~5e7 with ~100 nonzero elements.
    >>>> 
    >>>> I want to deal with these objects using a sort of sparse
    >>>> vector.
    >>>> 
    >>>> The problem is that I want to be able to 'add' two such
    >>>> vectors.
    >>>> Toy problem follows.  Suppose I have two such objects, 'a' and 'b':
    >>>> 
    >>>> 
    >>>> 
    >>>>> a
    >>>> $index
    >>>> [1]    20   30 100000000
    >>>> 
    >>>> $val
    >>>> [1] 2.2 3.3 4.4
    >>>> 
    >>>> 
    >>>> 
    >>>>> b
    >>>> $index
    >>>> [1]   3  30
    >>>> 
    >>>> $val
    >>>> [1] 0.1 0.1
    >>>> 
    >>>>> 
    >>>> 
    >>>> 
    >>>> What I want is the "sum" of these:
    >>>> 
    >>>>> AplusB
    >>>> $index
    >>>> [1]    3   20   30 100000000
    >>>> 
    >>>> $val
    >>>> [1]  0.1 2.2 3.4 4.4
    >>>> 
    >>>>> 
    >>>> 
    >>>> 
    >>>> See how the value for index=30 (being common to both) is 3.4
    >>>> (=3.3+0.1).   What's the best R idiom to achieve this?
    >>>> 
    >>>> 
    >>>> 
    >>>> -- 
    >>>> Robin K. S. Hankin
    >>>> Uncertainty Analyst
    >>>> University of Cambridge
    >>>> 19 Silver Street
    >>>> Cambridge CB3 9EP
    >>>> 01223-764877
    >>>> 
    >>>> ______________________________________________
    >>>> R-help at r-project.org mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>>> PLEASE do read the posting guide
    >>>> http://www.R-project.org/posting-guide.html
    >>>> and provide commented, minimal, self-contained, reproducible code.
    >>>> 
    >>> 
    >>> 
    >>> 
    >>> -- 
    >>> Henrique Dallazuanna
    >>> Curitiba-Paraná-Brasil
    >>> 25° 25' 40" S 49° 16' 22" O
    >>> 
    >>> [[alternative HTML version deleted]]
    >>> 
    >>> <ATT00001.txt>
    >> 


    Robin> -- 
    Robin> Robin K. S. Hankin
    Robin> Uncertainty Analyst
    Robin> University of Cambridge
    Robin> 19 Silver Street
    Robin> Cambridge CB3 9EP
    Robin> 01223-764877

    Robin> ______________________________________________
    Robin> R-help at r-project.org mailing list
    Robin> https://stat.ethz.ch/mailman/listinfo/r-help
    Robin> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    Robin> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list