[Bioc-devel] [devteam-bioc] Very slow when operate GRangesList
Valerie Obenchain
vobencha at fhcrc.org
Tue Aug 27 22:49:29 CEST 2013
Thanks Jianhong for reporting this.
Changes implemented in IRanges 1.19.27:
- RleList() constructor now has default 'compress=TRUE'.
- seqselect,Vector-method lapply() loop was replaced with direct subset.
New timings:
## generic subset function
fun0 <- function(x) x[500:1]
## GRangesList with RleList as metadata col
grll <- GRanges(seqnames="chr1",
IRanges(start=1:500, width=2),
someInfo=rep(RleList("*"), 500))
grr <- split(grll, 1:500)
> microbenchmark(fun0(grr), times=10)
Unit: milliseconds
expr min lq median uq max neval
fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.26367 10
Median is now 0.031 seconds compared to the previous 1.635.
>> > system.time(grr<- grr[500:1])
>> user system elapsed
>> 1.622 0.013 1.635
Valerie
On 08/23/2013 11:17 AM, Michael Lawrence wrote:
>
>
>
> On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
> Hi Michael,
>
> Martin and I have been discussing this. In addition to the fix you
> suggest, what do you think of changing the default to
> compressed=TRUE for the RleList constructor? Rle is the only one of
> the AtomicLists with default FALSE. Was there a reason for this when
> it was first implemented?
>
>
> I'm guessing Patrick did that because we always used Rles for coverage,
> and RleList for per-chromosome coverage. Also, there might be some
> overhead in that Rle runs in the unlistData can cross list elements.
>
> About my fix, the only downside would be if the range widths were much
> larger than the size of the vector, e.g., a highly compressed Rle,
> selected with chromosome-size ranges. Then the as.integer(ir) is big
> compared to the data. Otherwise, it's way faster.
>
>
> Val
>
>
>
>
> On 08/22/2013 07:34 PM, Maintainer wrote:
>
> Hi,
>
> SimpleLists are slow in this situation, basically because the
> underlying
> seqselect is slow, due to this loop:
>
> x <- do.call(c, lapply(seq_len(length(ir)),
> function(i)
> window(x,
> start = start(ir)[i], width = width(ir)[i])))
>
> Am I missing something or could this become a simple
> x[as.integer(ir)]?
>
> In the meantime, using CompressedLists is the way to go. So for an
> RleList, you need to pass compress=TRUE to the constructor.
>
>
> On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong
> <Jianhong.Ou at umassmed.edu <mailto:Jianhong.Ou at umassmed.edu>
> <mailto:Jianhong.Ou at umassmed.__edu
> <mailto:Jianhong.Ou at umassmed.edu>>> wrote:
>
> Hi,
>
> When I use big set of GrangesList, I found it become very
> slow when
> metadata contain AtomicList. e.g.
>
> > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
> width=2), someInfo=rep(RleList("*"), 500))
> > grr <- split(grll, 1:500)
> > grl <- as.list(grr)
> > system.time(grl<- grl[500:1])
> user system elapsed
> 0 0 0
> > system.time(grr<- grr[500:1])
> user system elapsed
> 1.622 0.013 1.635
> > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
> width=2))
> > grr <- split(grll, 1:500)
> > grl <- as.list(grr)
> > system.time(grl<- grl[500:1])
> user system elapsed
> 0 0 0
> > system.time(grr<- grr[500:1])
> user system elapsed
> 0.029 0.001 0.030
> > sessionInfo()
> R Under development (unstable) (2013-07-23 r63392)
> Platform: x86_64-apple-darwin12.4.0 (64-bit)
>
> locale:
> [1]
> en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] GenomicRanges_1.13.36 XVector_0.1.0 IRanges_1.19.24
> BiocGenerics_0.7.3
>
> loaded via a namespace (and not attached):
> [1] stats4_3.1.0 tools_3.1.0
>
> Is there any method to improve this?
>
> Yours sincerely,
>
> Jianhong Ou
>
> LRB 670A
> Program in Gene Function and Expression
> 364 Plantation Street Worcester,
> MA 01605
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
> ____________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.__fhcrc.org
> <mailto:devteam-bioc-leave at lists.fhcrc.org>
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc
> <https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>
>
>
>
More information about the Bioc-devel
mailing list