[Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

Valerie Obenchain vobencha at fhcrc.org
Fri Aug 23 17:41:30 CEST 2013


Hi Michael,

Martin and I have been discussing this. In addition to the fix you 
suggest, what do you think of changing the default to compressed=TRUE 
for the RleList constructor? Rle is the only one of the AtomicLists with 
default FALSE. Was there a reason for this when it was first implemented?

Val



On 08/22/2013 07:34 PM, Maintainer wrote:
> Hi,
>
> SimpleLists are slow in this situation, basically because the underlying
> seqselect is slow, due to this loop:
>
>              x <- do.call(c, lapply(seq_len(length(ir)), function(i)
> window(x,
>                  start = start(ir)[i], width = width(ir)[i])))
>
> Am I missing something or could this become a simple x[as.integer(ir)]?
>
> In the meantime, using CompressedLists is the way to go. So for an
> RleList, you need to pass compress=TRUE to the constructor.
>
>
> On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong <Jianhong.Ou at umassmed.edu
> <mailto:Jianhong.Ou at umassmed.edu>> wrote:
>
>     Hi,
>
>     When I use big set of GrangesList, I found it become very slow when
>     metadata contain AtomicList. e.g.
>
>      > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
>     width=2), someInfo=rep(RleList("*"), 500))
>      > grr <- split(grll, 1:500)
>      > grl <- as.list(grr)
>      > system.time(grl<- grl[500:1])
>         user  system elapsed
>            0       0       0
>      > system.time(grr<- grr[500:1])
>         user  system elapsed
>        1.622   0.013   1.635
>      > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
>     width=2))
>      > grr <- split(grll, 1:500)
>      > grl <- as.list(grr)
>      > system.time(grl<- grl[500:1])
>         user  system elapsed
>            0       0       0
>      > system.time(grr<- grr[500:1])
>         user  system elapsed
>        0.029   0.001   0.030
>      > sessionInfo()
>     R Under development (unstable) (2013-07-23 r63392)
>     Platform: x86_64-apple-darwin12.4.0 (64-bit)
>
>     locale:
>     [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
>     attached base packages:
>     [1] parallel  stats     graphics  grDevices utils     datasets
>       methods   base
>
>     other attached packages:
>     [1] GenomicRanges_1.13.36 XVector_0.1.0         IRanges_1.19.24
>        BiocGenerics_0.7.3
>
>     loaded via a namespace (and not attached):
>     [1] stats4_3.1.0 tools_3.1.0
>
>     Is there any method to improve this?
>
>     Yours sincerely,
>
>     Jianhong Ou
>
>     LRB 670A
>     Program in Gene Function and Expression
>     364 Plantation Street Worcester,
>     MA 01605
>
>              [[alternative HTML version deleted]]
>
>     _______________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> ________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>



More information about the Bioc-devel mailing list