[Bioc-devel] Subsetting an RleList object

Hervé Pagès hpages at fhcrc.org
Thu Nov 21 10:32:59 CET 2013


Hi Thomas,

In some particular situations seqselect<- was using some tricks
to be fast. In IRanges 1.20.6, I've ported these same tricks to [<-
so the performance regression you report below should be gone.
Let me know if you run into other issues with the subsetting code.

Thanks,
H.


On 11/11/2013 05:06 PM, Thomas Sandmann wrote:
> Hi Herve,
>
> thanks a lot for re-enabling the subsetting functionality for
> CompressedRleList with List-like objects.
> While things work now, I noticed a big difference in execution time for
> the following operations:
>
> with IRanges_1.18.2
>
> rles <- RleList(Rle(values=TRUE,__lengths=10000),
>                  Rle(values=TRUE,lengths= 10000),
>                  Rle(values=TRUE,lengths= 10000),
>                  Rle(values=TRUE,__lengths=10000),
>                  Rle(values=TRUE,__lengths=10000),
>                  Rle(values=TRUE,__lengths=10000),
>                  Rle(values=TRUE,__lengths=10000),
>                  Rle(values=TRUE,__lengths=10000),
>                  compress=TRUE)
>
> system.time(seqselect( rles, unname(list(a=20:108, b=41:131, c=21:105,
> d=1:1234,
>                     e=4:5, f=1223:1243, g=432:5234, h=444:5555) )) <- TRUE)
>
> clocks ca. *0.040s *on my system.
>
> R 3.0.2 with other attached packages:
>   [1] Rsamtools_1.12.2     Biostrings_2.28.0       devtools_1.3
>   [4] GenomicRanges_1.12.4 IRanges_1.18.2       BiocGenerics_0.6.0
>   [7] Defaults_1.1-1       BiocInstaller_1.10.3 roxygen2_2.2.2
> [10] digest_0.6.3
>
> with IRanges_1.20.5, the same operation is much slower:
>
> system.time( rles[ unname( list(a=20:108, b=41:131, c=21:105, d=1:1234,
>                      e=4:5, f=1223:1243, g=432:5234, h=444:5555)) ] <-
> TRUE )
>
> takes about *0.45s * more than 10x longer.**
>
> R3.0.0 with other attached packages:
>   [1] devtools_1.3    rtracklayer_1.22.0   Rsamtools_1.14.1
>   [4] Biostrings_2.30.0    GenomicRanges_1.14.3 XVector_0.2.0
>   [7] IRanges_1.20.5       BiocGenerics_0.8.0   Defaults_1.1-1
> [10] BiocInstaller_1.12.0 roxygen2_2.2.2       digest_0.6.3
> I noticed even larger speed degradation with real-life, longer datasets,
> so the decrease appears to be non-linear.
>
> Can you reproduce this difference in performance ?
> If so, would it be possible to reinstate the old seqselect method for
> the sake of efficiency ?
>
> Thomas

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list