[R] memory-efficient column aggregation of a sparse matrix
Jon Stearley
jrstear at sandia.gov
Fri Feb 2 00:42:47 CET 2007
On Feb 1, 2007, at 6:22 AM, Douglas Bates wrote:
> It turns out that in the sparse matrix code used by the
> Matrix package the triplet representation allows for duplicate index
> positions with the convention that the resulting value at a position
> is the sum of the values of any triplets with that index pair.
Very handy! I suggest adding this nugget near the "(possibly
redundant) triplets" phrase in Matrix.pdf.
> If you decide to use this approach please be aware that the indices
> for the triplet representation in the Matrix package are 0-based (as
> in C code) not 1-based (as in R code). (I imagine that Martin is
> thinking "we really should change that" as he reads this part.)
The Value of the appended function is equivalent to my previous
version, but it runs in 1/10'th the time, uses vastly less memory,
and is fewer lines of code to boot! Sure it's tricky, but it does
the trick.
THANK YOU SO MUCH!
-jon
NEWaggregate.csr <- function(x,fac) {
# cast into handy Matrix sparse Triplet form
x.T <- as(as(x, "dgRMatrix"), "dgTMatrix")
# factor column indexes (compensating for 0 vs 1 indexing)
x.T at j <- as.integer(as.integer(fac[x.T at j+1])-1)
# cast back, magically computing factor sums along the way :)
y <- as(x.T, "matrix.csr")
# and fix the dimension (doing this on x.T bus errors!)
y at dimension <- as.integer(c(nrow(y),nlevels(fac)))
y
}
More information about the R-help
mailing list