[R] disabling sparse Matrix index checking during assignment

Nathaniel Graham npgraham1 at gmail.com
Fri Dec 13 18:23:52 CET 2013


The project I'm working on requires producing a number of large
(250,000x250,000) sparse logical matrices.  I'm currently doing this
by updating the elements (turning FALSE to TRUE) of a matrix in
batches as they're identified like so:

x[idx.matrix] <- TRUE

where x is created via Matrix(nrow = n, ncol = n, data = FALSE) and
n is approximately 250,000.  The idx.matrix is a two column matrix of
indices to be assigned to.  This is done many times.  After profiling,
I've found that the lion's share of the work is taking place in the
internal calls to check for duplicates, etc in the indices passed to
[.  For instance, anyDuplicated.default is one of the most
time-consuming portions of my code according to Rprof.

This makes the whole process quite slow as there are frequently
thousands of index pairs in each call.  I'd like to disable as many of
these checks as possible; I can guarantee that aren't any duplicates,
and even if there are I don't especially care, since that would only
mean that a value is assigned TRUE twice instead of once.

I've tried a number of other approaches, such as creating a data.table
of all the indices to be changed and doing the assignment once, but
the temporary memory usage becomes enormous (I run out of memory on a
32GB machine).
I've also tried creating a temporary sparseMatrix and using '|' like so:

# a, b are numeric vectors of indices
x <- x | sparseMatrix(a, b, x = TRUE, dims = x at Dim, check = FALSE)

but this turns out to be slower than assignment; most of its time is
spent in the logical OR command.

Is there a way to speed this process up substantially?  Thanks in
advance for your help.
-------
Nathaniel Graham
npgraham1 at gmail.com
npgraham1 at uky.edu



More information about the R-help mailing list