[R] Slow indexing access for Matrix
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Mon Feb 23 23:13:51 CET 2009
Niels Richard Hansen wrote:
> Consider the following little "benchmark"
>
> > require(Matrix)
> > tmp <- Matrix(c(rep(1,1000),rep(0,9000)),ncol=1)
> > ind <- sample(1:10000,10000)
> > system.time(tmp[ind,])
> user system elapsed
> 0.004 0.001 0.005
>
> > ind <- sample(1:1000,10000,replace=TRUE)
> > system.time(tmp[ind,])
> user system elapsed
> 0.654 0.006 0.703
>
> > system.time(Matrix(as(tmp,"matrix")[ind,]))
> user system elapsed
> 0.005 0.000 0.006
>
> First I access all 10000 rows in a random order, which is fast,
> but when I access the first 1000 rows 10000 times there is a
> considerable slowdown. Last I convert back and forth
> between matrix and Matrix and get a serious speedup. Am I missing
> a point here? Should I not use indexing with "[" for the
> sparse matrices if I have repeated indices?
>
> I'm running Mac OS X, version 10.5.6, with Matrix package
> version 0.999375-21.
>
> I hope that somebody can enlighten me on this issue.
>
> Thanks, Niels
The sources have the answer, but I'm as reluctant to read them as you
are. ;-)
The repeated indices are certainly an important part of it. Notice also
that you'll have timings like
> ind <- sample(1:10000,10000,replace=TRUE)
> system.time(tmp[ind,])
user system elapsed
0.884 0.000 1.302
> ind <- sample(1:1000,10000,replace=TRUE)
> system.time(tmp[ind,])
user system elapsed
2.053 0.009 2.268
> ind <- sample(1:10000,10000,replace=FALSE)
> system.time(tmp[ind,])
user system elapsed
0.01 0.00 0.01
It is, however, apparently unrelated to the sparseness of the result
(sampling from 1001:2000 gives the same result).
Also
> ind <- sample(1:5000,5000,replace=FALSE)
> ind <- c(ind,ind)
> system.time(tmp[ind,])
user system elapsed
1.204 0.001 1.331
has a considerable part of the slowdown, as does
ind <- c(1:5000,1:5000)
Presumably the issue is that calculations on sparseness patterns are
harder when there are repeated indices.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list