[Bioc-devel] Subsetting an RleList object
Hervé Pagès
hpages at fhcrc.org
Thu Nov 21 10:32:59 CET 2013
Hi Thomas,
In some particular situations seqselect<- was using some tricks
to be fast. In IRanges 1.20.6, I've ported these same tricks to [<-
so the performance regression you report below should be gone.
Let me know if you run into other issues with the subsetting code.
Thanks,
H.
On 11/11/2013 05:06 PM, Thomas Sandmann wrote:
> Hi Herve,
>
> thanks a lot for re-enabling the subsetting functionality for
> CompressedRleList with List-like objects.
> While things work now, I noticed a big difference in execution time for
> the following operations:
>
> with IRanges_1.18.2
>
> rles <- RleList(Rle(values=TRUE,__lengths=10000),
> Rle(values=TRUE,lengths= 10000),
> Rle(values=TRUE,lengths= 10000),
> Rle(values=TRUE,__lengths=10000),
> Rle(values=TRUE,__lengths=10000),
> Rle(values=TRUE,__lengths=10000),
> Rle(values=TRUE,__lengths=10000),
> Rle(values=TRUE,__lengths=10000),
> compress=TRUE)
>
> system.time(seqselect( rles, unname(list(a=20:108, b=41:131, c=21:105,
> d=1:1234,
> e=4:5, f=1223:1243, g=432:5234, h=444:5555) )) <- TRUE)
>
> clocks ca. *0.040s *on my system.
>
> R 3.0.2 with other attached packages:
> [1] Rsamtools_1.12.2 Biostrings_2.28.0 devtools_1.3
> [4] GenomicRanges_1.12.4 IRanges_1.18.2 BiocGenerics_0.6.0
> [7] Defaults_1.1-1 BiocInstaller_1.10.3 roxygen2_2.2.2
> [10] digest_0.6.3
>
> with IRanges_1.20.5, the same operation is much slower:
>
> system.time( rles[ unname( list(a=20:108, b=41:131, c=21:105, d=1:1234,
> e=4:5, f=1223:1243, g=432:5234, h=444:5555)) ] <-
> TRUE )
>
> takes about *0.45s * more than 10x longer.**
>
> R3.0.0 with other attached packages:
> [1] devtools_1.3 rtracklayer_1.22.0 Rsamtools_1.14.1
> [4] Biostrings_2.30.0 GenomicRanges_1.14.3 XVector_0.2.0
> [7] IRanges_1.20.5 BiocGenerics_0.8.0 Defaults_1.1-1
> [10] BiocInstaller_1.12.0 roxygen2_2.2.2 digest_0.6.3
> I noticed even larger speed degradation with real-life, longer datasets,
> so the decrease appears to be non-linear.
>
> Can you reproduce this difference in performance ?
> If so, would it be possible to reinstate the old seqselect method for
> the sake of efficiency ?
>
> Thomas
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list