[R] using apply with sparse matrix from package Matrix
Jennifer Lyon
jennifer.s.lyon at gmail.com
Wed Sep 5 01:57:02 CEST 2012
On Tue, Sep 4, 2012 at 10:58 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> Jennifer Lyon <jennifer.s.lyon at gmail.com>
>>>>>> on Fri, 31 Aug 2012 17:22:57 -0600 writes:
>
> > Hi:
> > I was trying to use apply on a sparse matrix from package Matrix,
> > and I get the error:
>
> > Error in asMethod(object) :
> > Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 106
>
> > Is there a way to apply a function to all the rows without bumping
> > into this problem?
>
> > Here is a simplified example:
>
> >> dim(sm)
> > [1] 72913 43052
>
> >> class(sm)
> > [1] "dgCMatrix"
> > attr(,"package")
> > [1] "Matrix"
>
> >> str(sm)
> > Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> > ..@ i : int [1:6590004] 789 801 802 1231 1236 11739 17817
> > 17943 18148 18676 ...
> > ..@ p : int [1:43053] 0 147 303 450 596 751 908 1053 1188 1347 ...
> > ..@ Dim : int [1:2] 72913 43052
> > ..@ Dimnames:List of 2
> > .. ..$ : NULL
> > .. ..$ : NULL
> > ..@ x : num [1:6590004] 0.601 0.527 0.562 0.641 0.684 ...
> > ..@ factors : list()
>
> >> my.sum<-apply(sm, 1, sum)
> > Error in asMethod(object) :
> > Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 106
>
> So, actually it would have worked (though not efficiently) if
> your sm matrix would have been much smaller.
>
> However, we provide rowSums(), rowMeans(), colSums(), colMeans()
> for all of our matrices, including the sparse ones.
>
> So your present problem can be solved using
>
> my.sum <- rowSums(sm)
>
> Best regards,
> Martin Maechler, ETH Zurich
Thank you for letting me know about rowSums(). Two points. First,
sadly, I was unclear in my posting, and using "sum" was just an
example. In the real case I am using my own function on each row. I
guess the answer for this problem is that iteration is my friend. Good
to know.
Second, since I'm embarrassed to say I hadn't remembered rowSums(), for
cases when I needed the sum of the rows, I had just been postmultiplying
by a vector of 1's. Just FYI, I thought I should try rowSums(), so did
a small timing trial, and it appears postmultiplying is faster than
rowSums. Run is as follows:
> str(sm)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:6590004] 721 926 1275 1791 2370 2755 3393 4638
5363 5566 ...
..@ p : int [1:43053] 0 147 303 450 596 751 908 1053 1188 1347 ...
..@ Dim : int [1:2] 72913 43052
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:6590004] 0.0735 0.3206 0.1861 0.1604 0.197 ...
..@ factors : list()
> library(rbenchmark)
#Just checking how expensive building a vector of 1's is - not very
#at least for matrix of the size I'm interested in
> benchmark(i1<-rep(1, ncol(sm)))
test replications elapsed relative user.self sys.self
1 i1 <- rep(1, ncol(sm)) 100 0.119 1 0.12 0
user.child sys.child
1 0 0
#Postmultiplying by 1's timing
> benchmark(la<-sm %*% i1)
test replications elapsed relative user.self sys.self user.child
1 la <- sm %*% i1 100 5.993 1 5.993 0 0
sys.child
1 0
#rowSums timing
> benchmark(la1<-rowSums(sm))
test replications elapsed relative user.self sys.self
1 la1 <- rowSums(sm) 100 28.117 1 28.114 0.004
user.child sys.child
1 0 0
#Make sure the results are the same
> all(la==la1)
[1] TRUE
The Matrix package is awesome, and I appreciate you taking the
time to answer my questions.
Jen
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rbenchmark_0.3.1 Matrix_1.0-6 lattice_0.20-6
loaded via a namespace (and not attached):
[1] grid_2.15.1
More information about the R-help
mailing list