[R] Scaling rows of a large Matrix::sparseMatrix()

tomdharray at gmail.com tomdharray at gmail.com
Wed Jan 13 02:50:54 CET 2016


Hello R-Users,

I'm looking for a way to scale the rows of a sparse matrix M with about
57,000 rows, 14,000 columns, and 238,000 non-zero matrix elements; see
example code below.

Usually I'd use the base::scale() function (see sample code), but it
freezes my computer. The same happens when I try to run a for loop over
the matrix rows.

The conversion with as.matrix() yields a 5.8 Gb large object, which
appears too large for scale().


So my question is: How can the rows of a large sparse matrix be
efficiently scaled?

Thanks and regards,

Dirk


### Hardware/Session Info
Intel Core i7 w/ 12 Gb RAM
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

### Example Code
library(Matrix)
set.seed(42)

## These are exemplary values for my real "problem matrix"
N_ROW <- 56743
N_COL <- 13648
SIZE  <- 238283
PROB <- c(0.050, 0.050, 0.099, 0.149, 0.198, 0.178, 0.119,
          0.079, 0.0297, 0.0198, 0.001, 0.001, 0.001)

## get some random values to populate the sparse matrix
x <- do.call(
  what = rbind,
  args = lapply(X = 1:N_ROW,
                FUN = function(i)
                  expand.grid(i,
                    sample(x = 1:N_COL,
                      size = sample(1:15, 1),
                      replace = TRUE)
                  )
         )
)
x[,3] <- sample(x = 1:13, size = nrow(x),
           replace = TRUE, prob = PROB)

## build the sparse matrix
M <- Matrix::sparseMatrix(
       dims = c(N_ROW, N_COL),
       i = x[,1],
       j = x[,2],
       x = x[,3]
)
print(format(object.size(M), units = "auto"))

## *******************************************
## Scaling the rows of M

## scale() lets my computer freeze
# M <- scale(t(M), center = FALSE, scale(Matrix::rowSums(M)))

## this appears to be not elegant at all and takes forever
# rwsms <- Matrix::rowSums(M)
# for (i in 1:nrow(M)) M[i,] <- M[i,]/rwsms[[i]]



More information about the R-help mailing list