[Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

Ou, Jianhong Jianhong.Ou at umassmed.edu
Tue Aug 27 22:55:35 CEST 2013


Dear Valerie,

Great improvement. Thanks a lot for your work. I am greatly appreciated
for this.

Yours sincerely,

Jianhong Ou

LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605




On 8/27/13 4:49 PM, "Valerie Obenchain" <vobencha at fhcrc.org> wrote:

>Thanks Jianhong for reporting this.
>
>Changes implemented in IRanges 1.19.27:
>- RleList() constructor now has default 'compress=TRUE'.
>- seqselect,Vector-method lapply() loop was replaced with direct subset.
>
>New timings:
>
>## generic subset function
>fun0 <- function(x) x[500:1]
>
>## GRangesList with RleList as metadata col
>grll <- GRanges(seqnames="chr1",
>                 IRanges(start=1:500, width=2),
>                 someInfo=rep(RleList("*"), 500))
>grr <- split(grll, 1:500)
> > microbenchmark(fun0(grr), times=10)
>Unit: milliseconds
>       expr      min       lq   median      uq      max neval
>  fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.26367    10
>
>Median is now 0.031 seconds compared to the previous 1.635.
>
>>>               > system.time(grr<- grr[500:1])
>>>                  user  system elapsed
>>>                 1.622   0.013   1.635
>
>
>
>Valerie
>
>
>On 08/23/2013 11:17 AM, Michael Lawrence wrote:
>>
>>
>>
>> On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain <vobencha at fhcrc.org
>> <mailto:vobencha at fhcrc.org>> wrote:
>>
>>     Hi Michael,
>>
>>     Martin and I have been discussing this. In addition to the fix you
>>     suggest, what do you think of changing the default to
>>     compressed=TRUE for the RleList constructor? Rle is the only one of
>>     the AtomicLists with default FALSE. Was there a reason for this when
>>     it was first implemented?
>>
>>
>> I'm guessing Patrick did that because we always used Rles for coverage,
>> and RleList for per-chromosome coverage. Also, there might be some
>> overhead in that Rle runs in the unlistData can cross list elements.
>>
>> About my fix, the only downside would be if the range widths were much
>> larger than the size of the vector, e.g., a highly compressed Rle,
>> selected with chromosome-size ranges. Then the as.integer(ir) is big
>> compared to the data. Otherwise, it's way faster.
>>
>>
>>     Val
>>
>>
>>
>>
>>     On 08/22/2013 07:34 PM, Maintainer wrote:
>>
>>         Hi,
>>
>>         SimpleLists are slow in this situation, basically because the
>>         underlying
>>         seqselect is slow, due to this loop:
>>
>>                       x <- do.call(c, lapply(seq_len(length(ir)),
>>         function(i)
>>         window(x,
>>                           start = start(ir)[i], width = width(ir)[i])))
>>
>>         Am I missing something or could this become a simple
>>         x[as.integer(ir)]?
>>
>>         In the meantime, using CompressedLists is the way to go. So for
>>an
>>         RleList, you need to pass compress=TRUE to the constructor.
>>
>>
>>         On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong
>>         <Jianhong.Ou at umassmed.edu <mailto:Jianhong.Ou at umassmed.edu>
>>         <mailto:Jianhong.Ou at umassmed.__edu
>>         <mailto:Jianhong.Ou at umassmed.edu>>> wrote:
>>
>>              Hi,
>>
>>              When I use big set of GrangesList, I found it become very
>>         slow when
>>              metadata contain AtomicList. e.g.
>>
>>               > grll <- GRanges(seqnames="chr1",
>>ranges=IRanges(start=1:500,
>>              width=2), someInfo=rep(RleList("*"), 500))
>>               > grr <- split(grll, 1:500)
>>               > grl <- as.list(grr)
>>               > system.time(grl<- grl[500:1])
>>                  user  system elapsed
>>                     0       0       0
>>               > system.time(grr<- grr[500:1])
>>                  user  system elapsed
>>                 1.622   0.013   1.635
>>               > grll <- GRanges(seqnames="chr1",
>>ranges=IRanges(start=1:500,
>>              width=2))
>>               > grr <- split(grll, 1:500)
>>               > grl <- as.list(grr)
>>               > system.time(grl<- grl[500:1])
>>                  user  system elapsed
>>                     0       0       0
>>               > system.time(grr<- grr[500:1])
>>                  user  system elapsed
>>                 0.029   0.001   0.030
>>               > sessionInfo()
>>              R Under development (unstable) (2013-07-23 r63392)
>>              Platform: x86_64-apple-darwin12.4.0 (64-bit)
>>
>>              locale:
>>              [1]
>>         
>>en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>>
>>              attached base packages:
>>              [1] parallel  stats     graphics  grDevices utils
>>datasets
>>                methods   base
>>
>>              other attached packages:
>>              [1] GenomicRanges_1.13.36 XVector_0.1.0
>>IRanges_1.19.24
>>                 BiocGenerics_0.7.3
>>
>>              loaded via a namespace (and not attached):
>>              [1] stats4_3.1.0 tools_3.1.0
>>
>>              Is there any method to improve this?
>>
>>              Yours sincerely,
>>
>>              Jianhong Ou
>>
>>              LRB 670A
>>              Program in Gene Function and Expression
>>              364 Plantation Street Worcester,
>>              MA 01605
>>
>>                       [[alternative HTML version deleted]]
>>
>>              _________________________________________________
>>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>         <mailto:Bioc-devel at r-project.__org
>>         <mailto:Bioc-devel at r-project.org>> mailing list
>>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>
>>         
>>_________________________________________________________________________
>>___
>>         devteam-bioc mailing list
>>         To unsubscribe from this mailing list send a blank email to
>>         devteam-bioc-leave at lists.__fhcrc.org
>>         <mailto:devteam-bioc-leave at lists.fhcrc.org>
>>         You can also unsubscribe or change your personal options at
>>         https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc
>>         <https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>
>>
>>
>>
>



More information about the Bioc-devel mailing list