[R] Sparse matrix performance question

Douglas Bates bates at stat.wisc.edu
Tue Dec 7 15:21:32 CET 2010


On Mon, Dec 6, 2010 at 1:11 PM, scott white <distributedintel at gmail.com> wrote:
> Btw, forgot to mention I am using the standard Matrix package and I am
> running version 2.10.1 of R.
>
> On Mon, Dec 6, 2010 at 11:04 AM, scott white <distributedintel at gmail.com>wrote:
>
>> I have a very sparse square matrix which is < 20K rows & columns and I am
>> trying to row standardize the matrix for the rows that have non-missing
>> value as follows:
>>
>> row_sums <- rowSums(M,na.rm=TRUE)
>> nonzero_idxs <- which(row_sums>0)
>> nonzero_M <- M[nonzero_idxs,]/row_sums[nonzero_idxs]
>> M[nonzero_idxs,] <- nonzero_M

Assignment of submatrices in a sparse matrix can be slow because there
is so much checking that needs to be done.  It is probably easier to
do the calculation directly on the data component of the matrix and
generate a new one.  The tricky bit to remember is that the indices in
the sparse matrix representation are 0-based so you need to add 1 when
using them in R.

I enclose a transcript.

>>
>> Each line completes well under a second except the last line which takes
>> well over 10 seconds which is simply assigning the sub-matrix of rows that
>> have non-missing values to the complete matrix. I am curious to know why it
>> is so slow and how to speed it up. Should I be doing this differently or try
>> a different sparse matrix library?
>>
>> Any feedback is appreciated.
>>
>> thanks,
>> Scott
>>
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-------------- next part --------------

R version 2.12.0 (2010-10-15)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Matrix)
Loading required package: lattice

Attaching package: 'Matrix'

The following object(s) are masked from 'package:base':

    det

> set.seed(1234)
> M <- sparseMatrix(i=sample(5000, 1000, replace=TRUE),
+                   j=sample(5000, 1000, replace=TRUE),
+                   x=rnorm(1000), dims=c(5000, 5000))
> str(M)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:1000] 2014 549 1098 3137 130 1523 2198 3921 4323 931 ...
  ..@ p       : int [1:5001] 0 0 0 0 0 0 0 0 0 0 ...
  ..@ Dim     : int [1:2] 5000 5000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ x       : num [1:1000] -0.4236 -0.5322 0.0675 -0.4105 -2.3708 ...
  ..@ factors : list()
> range(M at i)
[1]    1 4996
> str(rs <- rowSums(M, na.rm=TRUE))
 num [1:5000] 0 0.501 0 0.598 -0.957 ...
> res <- sparseMatrix(i=M at i, p=M at p, dims=M at Dim,
+                     x=M at x/rs[M at i + 1L], index1=FALSE)
> str(res)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:1000] 2014 549 1098 3137 130 1523 2198 3921 4323 931 ...
  ..@ p       : int [1:5001] 0 0 0 0 0 0 0 0 0 0 ...
  ..@ Dim     : int [1:2] 5000 5000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ x       : num [1:1000] 1 1 1 -0.655 1 ...
  ..@ factors : list()
> table(rowSums(res))

   0    1 
4082  918 
> 
> proc.time()
   user  system elapsed 
  3.010   0.120   3.612 


More information about the R-help mailing list