[R] working with sparse matrix

Wed Jun 22 10:34:18 CEST 2011

>>>>> David Winsemius <dwinsemius at comcast.net>
>>>>>     on Tue, 21 Jun 2011 12:07:47 -0400 writes:

    > On Jun 21, 2011, at 11:44 AM, Patrick Breheny wrote:

    >> On 06/21/2011 09:48 AM, davefrederick wrote:
    >>> Hi, I have a 500x 53380 sparse matrix and I am trying to
    >>> dichotomize it.  Under sna package I have found event2dichot
    >>> yet it doesnt recognize sparse matrix and requires adjacency
    >>> matrix or array. I tried to run a simple loop code
    >>> 
    >>> for (i in 1:500) for (j in 1:53380) if (matrix[i,j]>0)
    >>> matrix[i,j]=1
    >> 
    >> The code you are running does not require a loop:
    >> 
    >> > A <- cbind(c(1,0),c(0,2))
    >> > A
    >>      [,1] [,2]
    >> [1,]    1    0
    >> [2,]    0    2
    >> > A[A>0] <- 1
    >> > A
    >>    [,1] [,2] 
    >> [1,] 1   0
    >> [2,] 0   1
    >>
    >> However, for large sparse matrices, this and other operations
    >> will be faster if the matrix is explicitly stored as a sparse
    >> matrix, as implemented in the 'Matrix' package.

    > require(Matrix)
    > M <- Matrix(0, 10,10)  # empty sparse matrix
    > M[1:10, 1] <- 1
    > M[1:10, 2] <- 2
    > M[1:10, 3] <- -3

    > M[M > 0] <- 1

    >> M
    > 10 x 10 sparse Matrix of class "dgCMatrix"

    > [1,] 1 1 -3 . . . . . . .
    > [2,] 1 1 -3 . . . . . . .
    > [3,] 1 1 -3 . . . . . . .
    > [4,] 1 1 -3 . . . . . . .
    > [5,] 1 1 -3 . . . . . . .
    > [6,] 1 1 -3 . . . . . . .
    > [7,] 1 1 -3 . . . . . . .
    > [8,] 1 1 -3 . . . . . . .
    > [9,] 1 1 -3 . . . . . . .
    > [10,] 1 1 -3 . . . . . . .

    >> M2 <- as.matrix(M)

    >> M2
    > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    > [1,]    1    1   -3    0    0    0    0    0    0     0
    > [2,]    1    1   -3    0    0    0    0    0    0     0
    > [3,]    1    1   -3    0    0    0    0    0    0     0
    > [4,]    1    1   -3    0    0    0    0    0    0     0
    > [5,]    1    1   -3    0    0    0    0    0    0     0
    > [6,]    1    1   -3    0    0    0    0    0    0     0
    > [7,]    1    1   -3    0    0    0    0    0    0     0
    > [8,]    1    1   -3    0    0    0    0    0    0     0
    > [9,]    1    1   -3    0    0    0    0    0    0     0
    > [10,]    1    1   -3    0    0    0    0    0    0     0

    > There might have been a one or two second pause while the last
    > command was executing for this test:

    >> M <- Matrix(0, 500, 53380 )
    >> M[1:10, 1] <- 1
    >> M[1:10, 2] <- 2
    >> M[1:10, 3] <- -3
    >> M[M > 0] <- 1

Yes, that's much better, thank you, David.

Just a short note:  The above technique  of
     M <- Matrix(0, n, m)	
     M[i,j] <- v1
     M[k,l] <- v2
     ...
maybe natural to produce a sparse matrix, and ok for such very
small examples, but it is typically  *MUCH MUCH* less efficient than
directly using the
      sparseMatrix()
or    spMatrix()
functions which the Matrix package provides.

    >> M2 <- as.matrix(M)

Why would you have to make your sparse matrix dense?
(It won't work for much larger matrices anyway: They can't exist
 in your RAM as dense ..)...
Yes, I see you work with further code that does not accept
sparse matrices.
The glmnet package (lasso, etc) does it very nicely,
{and if you use package 'MatrixModels', with  model.Matrix()
 you can even use  formulas to construct sparse (model) matrices
 as input for glmnet !}.

After my imminent vacation {three weeks to Columbia and Ecuador!},
I'll gladly help package authors (eg of sna) to change their
code such that it should work with sparse matrices.

With regards,
Martin Maechler, ETH Zurich

    > -- 
    > David.

    >> 
    >> -- 
    >> Patrick Breheny Assistant Professor Department of Biostatistics
    >> Department of Statistics University of Kentucky

    > David Winsemius, MD
    > West Hartford, CT