[BioC] Getting the length of every element from a large CompressedIRangesList is slow
Nicolas Delhomme
delhomme at embl.de
Mon Jul 2 19:02:45 CEST 2012
Hej!
I've a rather large CompressedIRangesList
>print(object.size(aln.ranges),unit="Mb")
390.4 Mb
that has 2518 elements, some of which having up to 6M ranges for a total of 51M, but the vast majority are small, the median is 2 while the mean is ~ 20,000 (the 3rd quartile has a value of 47).
Retrieving the element length is slow:
>system.time(sizes <- sapply(aln.ranges,length))
user system elapsed
265.777 169.222 443.498
by comparison to the performances of the IRanges package in general, which I was surprised of. Are there faster way to get this information than the sapply I'm using? Note that the machine I'm using is not a limiting factor in terms of CPU/RAM/load.
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.15.15 BiocGenerics_0.3.0
loaded via a namespace (and not attached):
[1] stats4_2.15.1
Nico
P.S. If you need, I can send my aln.ranges object off-list.
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
More information about the Bioconductor
mailing list