[Bioc-devel] [devteam-bioc] Very slow when operate GRangesList
Ou, Jianhong
Jianhong.Ou at umassmed.edu
Tue Aug 27 22:55:35 CEST 2013
Dear Valerie,
Great improvement. Thanks a lot for your work. I am greatly appreciated
for this.
Yours sincerely,
Jianhong Ou
LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605
On 8/27/13 4:49 PM, "Valerie Obenchain" <vobencha at fhcrc.org> wrote:
>Thanks Jianhong for reporting this.
>
>Changes implemented in IRanges 1.19.27:
>- RleList() constructor now has default 'compress=TRUE'.
>- seqselect,Vector-method lapply() loop was replaced with direct subset.
>
>New timings:
>
>## generic subset function
>fun0 <- function(x) x[500:1]
>
>## GRangesList with RleList as metadata col
>grll <- GRanges(seqnames="chr1",
> IRanges(start=1:500, width=2),
> someInfo=rep(RleList("*"), 500))
>grr <- split(grll, 1:500)
> > microbenchmark(fun0(grr), times=10)
>Unit: milliseconds
> expr min lq median uq max neval
> fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.26367 10
>
>Median is now 0.031 seconds compared to the previous 1.635.
>
>>> > system.time(grr<- grr[500:1])
>>> user system elapsed
>>> 1.622 0.013 1.635
>
>
>
>Valerie
>
>
>On 08/23/2013 11:17 AM, Michael Lawrence wrote:
>>
>>
>>
>> On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain <vobencha at fhcrc.org
>> <mailto:vobencha at fhcrc.org>> wrote:
>>
>> Hi Michael,
>>
>> Martin and I have been discussing this. In addition to the fix you
>> suggest, what do you think of changing the default to
>> compressed=TRUE for the RleList constructor? Rle is the only one of
>> the AtomicLists with default FALSE. Was there a reason for this when
>> it was first implemented?
>>
>>
>> I'm guessing Patrick did that because we always used Rles for coverage,
>> and RleList for per-chromosome coverage. Also, there might be some
>> overhead in that Rle runs in the unlistData can cross list elements.
>>
>> About my fix, the only downside would be if the range widths were much
>> larger than the size of the vector, e.g., a highly compressed Rle,
>> selected with chromosome-size ranges. Then the as.integer(ir) is big
>> compared to the data. Otherwise, it's way faster.
>>
>>
>> Val
>>
>>
>>
>>
>> On 08/22/2013 07:34 PM, Maintainer wrote:
>>
>> Hi,
>>
>> SimpleLists are slow in this situation, basically because the
>> underlying
>> seqselect is slow, due to this loop:
>>
>> x <- do.call(c, lapply(seq_len(length(ir)),
>> function(i)
>> window(x,
>> start = start(ir)[i], width = width(ir)[i])))
>>
>> Am I missing something or could this become a simple
>> x[as.integer(ir)]?
>>
>> In the meantime, using CompressedLists is the way to go. So for
>>an
>> RleList, you need to pass compress=TRUE to the constructor.
>>
>>
>> On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong
>> <Jianhong.Ou at umassmed.edu <mailto:Jianhong.Ou at umassmed.edu>
>> <mailto:Jianhong.Ou at umassmed.__edu
>> <mailto:Jianhong.Ou at umassmed.edu>>> wrote:
>>
>> Hi,
>>
>> When I use big set of GrangesList, I found it become very
>> slow when
>> metadata contain AtomicList. e.g.
>>
>> > grll <- GRanges(seqnames="chr1",
>>ranges=IRanges(start=1:500,
>> width=2), someInfo=rep(RleList("*"), 500))
>> > grr <- split(grll, 1:500)
>> > grl <- as.list(grr)
>> > system.time(grl<- grl[500:1])
>> user system elapsed
>> 0 0 0
>> > system.time(grr<- grr[500:1])
>> user system elapsed
>> 1.622 0.013 1.635
>> > grll <- GRanges(seqnames="chr1",
>>ranges=IRanges(start=1:500,
>> width=2))
>> > grr <- split(grll, 1:500)
>> > grl <- as.list(grr)
>> > system.time(grl<- grl[500:1])
>> user system elapsed
>> 0 0 0
>> > system.time(grr<- grr[500:1])
>> user system elapsed
>> 0.029 0.001 0.030
>> > sessionInfo()
>> R Under development (unstable) (2013-07-23 r63392)
>> Platform: x86_64-apple-darwin12.4.0 (64-bit)
>>
>> locale:
>> [1]
>>
>>en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils
>>datasets
>> methods base
>>
>> other attached packages:
>> [1] GenomicRanges_1.13.36 XVector_0.1.0
>>IRanges_1.19.24
>> BiocGenerics_0.7.3
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_3.1.0 tools_3.1.0
>>
>> Is there any method to improve this?
>>
>> Yours sincerely,
>>
>> Jianhong Ou
>>
>> LRB 670A
>> Program in Gene Function and Expression
>> 364 Plantation Street Worcester,
>> MA 01605
>>
>> [[alternative HTML version deleted]]
>>
>> _________________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> <mailto:Bioc-devel at r-project.__org
>> <mailto:Bioc-devel at r-project.org>> mailing list
>> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>
>>
>>_________________________________________________________________________
>>___
>> devteam-bioc mailing list
>> To unsubscribe from this mailing list send a blank email to
>> devteam-bioc-leave at lists.__fhcrc.org
>> <mailto:devteam-bioc-leave at lists.fhcrc.org>
>> You can also unsubscribe or change your personal options at
>> https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc
>> <https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>
>>
>>
>>
>
More information about the Bioc-devel
mailing list