[R] Quickly reading data into the Matrix packages sparse formats
Paul Bailey
pdbailey at umd.edu
Tue Jun 17 03:31:23 CEST 2008
I have data set that I wish to solve with the Matrix package's sparse
matrix functionality. The speed improvements that it has achieved are
amazing, with my dense matrix solutions never taking really long
enough to time in what I've been able to time so far. However, before
I can solve my full linear model, I need to be able to read in all
the data, and therein lies the rub. There are two ways that I see to
read it in:
(1) generate a dense X matrix and then convert it to a sparse matrix
using i.e.
R> require(Matrix)
R> Xsparse <- as(X,"dgCMatrix")
(2) make a new sparse X matrix and then populate it.
R> require(Matrix)
R> Xsparse <- Matrix(0,nrow=n,ncol=m,sparse=T)
then for relevant cells:
R> Xsparse[i,j] <- v
But both of these methods are painfully slow. method 1 takes many
times as long as the actual solving and what's worse, ends up being
only about 1/2 as time consuming as sparse solvers when all is told.
It also requires that a dense version of X approximately fit in
memory. method 2 is significantly slower still, taking more than a
factor of 10 longer than the dense solver. For 2 I tried dgCMatrix
and dgTMatrix with little difference. I've searched though the
documentation on the Matrix package, and there is no mention of this
problem or its potential cure.
Is there some way that I can format the data that will allow for
rapid read in, or is there some other possible cure?
Cheers,
Paul Bailey
More information about the R-help
mailing list