[R] sparse vectors
Martin Maechler
maechler at stat.math.ethz.ch
Tue Sep 8 16:42:09 CEST 2009
>>>>> "Robin" == Robin Hankin <rksh1 at cam.ac.uk>
>>>>> on Tue, 08 Sep 2009 14:58:49 +0100 writes:
Robin> Hi guys
Robin> thanks for this, it works fine, but I'm not sure the Matrix package does
Robin> what I want:
>> a = sparseMatrix(i=c(20, 30, 1000000000), j=rep(1, 3), x=c(2.2, 3.3, 4.4))
Robin> Error in asMethod(object) :
Robin> Cholmod error 'out of memory' at file:../Core/cholmod_memory.c, line 148
Robin> Surely an efficient storage mechanism would need only six pieces of
Robin> information?
sure.
sparseMatrix() is designed to produces "column-compressed"
sparse matrices ("CsparseMatrix"), as these are optimal in
some sense for further matrix operations notably in the CHOLMOD
C library to which the Matrix package is interfaced.
Indeed, you have triggered a problem in that CHOLMOD code,
which needs an inordinate amount of memory when it really should
not.
Alternatively, for your case, I'd recommend to use the
(older, slightly less flexible) constructor
spMatrix() [which produces a triplet ("Tsparse..") sparse
matrix representation]
which works without that funny memory glitch.
Note BTW, that 'i=1000000000' pretty close .Machine$integer.max
and we currently require the indices to be "integer" (in the
sense of R, i.e., 32-bit).
Alternatively, I had introduced the "sparseVector" class into
the Matrix package a while ago, which *does* allow "numeric"
indices ...
.. but at the moment does not have too many methods defined,
notably not arithmetic.
{ The reason I introduced the class was actually to allow
"reshaping" sparse matrices, i.e., to use
dim(<sparseMatrix>) <- c(n1, n2)
}.
Robin> I've been pondering the solution that Henrique suggested, that uses
Robin> merge(). This seems to be fine, although it might be possible
Robin> to squeeze some efficiency gains by using the fact that
Robin> the index vector is always sorted, which migh save some
Robin> searching time.
the sparseMatrix and sparseVector classes in Matrix do always
keep the indices sorted,
and actually your use case would motivate me quite a bit to add
more (arithmetic) capabilities to the "sparseVector" classes.
Martin
Robin> Any thoughts anyone?
Robin> best wishes
Robin> Robin
Robin> Benilton Carvalho wrote:
>> library(Matrix)
>> a = sparseMatrix(i=c(20, 30, 100000000), j=rep(1, 3), x=c(2.2, 3.3, 4.4))
>> b = sparseMatrix(i=c(3, 30), j=rep(1, 2), x=c(0.1, 0.1), dims=dim(a))
>> theSum = a+b
>> summary(theSum)
>>
>>
>> hth,
>> b
>>
>> On Sep 8, 2009, at 10:19 AM, Henrique Dallazuanna wrote:
>>
>>> Try this:
>>>
>>> abMerge <- merge(a, b, by = 'index', all = TRUE)
>>> list(index = abMerge$index, val = rowSums(abMerge[,2:3], na.rm = TRUE))
>>>
>>> On Tue, Sep 8, 2009 at 10:06 AM, Robin Hankin <rksh1 at cam.ac.uk> wrote:
>>>
>>>> Hi
>>>>
>>>> I deal with long vectors almost all of whose elements are zero.
>>>> Typically, the length will be ~5e7 with ~100 nonzero elements.
>>>>
>>>> I want to deal with these objects using a sort of sparse
>>>> vector.
>>>>
>>>> The problem is that I want to be able to 'add' two such
>>>> vectors.
>>>> Toy problem follows. Suppose I have two such objects, 'a' and 'b':
>>>>
>>>>
>>>>
>>>>> a
>>>> $index
>>>> [1] 20 30 100000000
>>>>
>>>> $val
>>>> [1] 2.2 3.3 4.4
>>>>
>>>>
>>>>
>>>>> b
>>>> $index
>>>> [1] 3 30
>>>>
>>>> $val
>>>> [1] 0.1 0.1
>>>>
>>>>>
>>>>
>>>>
>>>> What I want is the "sum" of these:
>>>>
>>>>> AplusB
>>>> $index
>>>> [1] 3 20 30 100000000
>>>>
>>>> $val
>>>> [1] 0.1 2.2 3.4 4.4
>>>>
>>>>>
>>>>
>>>>
>>>> See how the value for index=30 (being common to both) is 3.4
>>>> (=3.3+0.1). What's the best R idiom to achieve this?
>>>>
>>>>
>>>>
>>>> --
>>>> Robin K. S. Hankin
>>>> Uncertainty Analyst
>>>> University of Cambridge
>>>> 19 Silver Street
>>>> Cambridge CB3 9EP
>>>> 01223-764877
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Henrique Dallazuanna
>>> Curitiba-Paraná-Brasil
>>> 25° 25' 40" S 49° 16' 22" O
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> <ATT00001.txt>
>>
Robin> --
Robin> Robin K. S. Hankin
Robin> Uncertainty Analyst
Robin> University of Cambridge
Robin> 19 Silver Street
Robin> Cambridge CB3 9EP
Robin> 01223-764877
Robin> ______________________________________________
Robin> R-help at r-project.org mailing list
Robin> https://stat.ethz.ch/mailman/listinfo/r-help
Robin> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Robin> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list