[R] memory-efficient column aggregation of a sparse matrix
Jon Stearley
jrstear at sandia.gov
Thu Feb 1 02:59:30 CET 2007
I need to sum the columns of a sparse matrix according to a factor -
ie given a sparse matrix X and a factor fac of length ncol(X), sum
the elements by column factors and return the sparse matrix Y of size
nrow(X) by nlevels(f). The appended code does the job, but is
unacceptably memory-bound because tapply() uses a non-sparse
representation. Can anyone suggest a more memory and cpu efficient
approach? Eg, a sparse matrix tapply method? Thanks.
--
+--------------------------------------------------------------+
| Jon Stearley (505) 845-7571 (FAX 844-9297) |
| Sandia National Laboratories Scalable Systems Integration |
+--------------------------------------------------------------+
# x and y are of SparseM class matrix.csr
"aggregate.csr" <-
function(x, fac) {
# make a vector indicating the row of each nonzero
rows <- integer(length=length(x at ra))
rows[x at ia[1:nrow(x)]] <- 1 # put a 1 at start of each row
rows <- as.integer(cumsum(rows)) # and finish with a cumsum
# make a vector indicating the column factor of each nonzero
f <- fac[x at ja]
# aggregate by row,f
y <- tapply(x at ra, list(rows,f), sum)
# sparsify it
y[is.na(y)] <- 0 # change tapply NAs to as.matrix.csr 0s
y <- as.matrix.csr(y)
y
}
More information about the R-help
mailing list