[Bioc-devel] GRanges Unique [actually -- `order`] Method

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jun 15 07:17:45 CEST 2011


I took another crack at my original attempt and reduced a call to my
GenomicRanges::order from ~ 22 seconds to ~ 5.5 seconds over 1 million
randomly picked ranges over hsapiens.

Still not super fast, but not as abysmal as before.

I'll put it here for review before checking in (or not):
https://gist.github.com/1026520

Thanks,
-steve

On Tue, Jun 14, 2011 at 8:06 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> (Digging up an old [related] thread since I'm not sure of the status
> of the code that Michael referred to in this context is ...)
>
> I have a suboptimal-but-working implementation of `order` (and by
> extension, `sort`) for GenomicRanges objects, eg. it calculates the
> `order`ing of a GRanges object of length 1 million (randomly spread
> across all Hsapiens chromosomes and strands) in ~ 22 seconds[*].
>
> The resulting/ordered ranges are sorted/grouped by
> seqnames,strand,ranges (the caller can specify the ordering of the
> seqnames, otherwise the ordering as defined by
> seqleves(your.granges.object) is used.
>
> Also it is only defined for one GRanges object (not sure what the
> appropriate result would be if multiple granges objects are passed in)
>
> I can check it into SVN if that sounds good so it can work as a
> stop-gap until one of the *Ranges-guru's can whip up a superior one.
>
> [*] By the by, the runtime is dominated by iterating over the seqnames
> and subselecting the appropriate ranges to work for one at a time ...
> maybe the speed can be increased by using `split` a few times, but
> then you have several copies of your GRanges object in memory, so ...
> not sure what's best atm or how useful it is to talk about code in the
> "abstract," but we can continue the discussion if you reckon it's
> worthy to be checked in for now ...
>
> -steve
>
> On Wed, May 25, 2011 at 9:02 AM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> Someone has to write the methods...
>>
>> On Tue, May 24, 2011 at 11:00 PM, Dario Strbenac
>> <D.Strbenac at garvan.org.au>wrote:
>>
>>> >   Yes, the sort method just calls order.
>>>
>>> Something isn't quite working out for me.
>>>
>>> library(GenomicRanges) # 1.4.5
>>> gr <- GRanges("chr1", IRanges(c(1, 10), c(50, 60)), '+')
>>> sort(gr)
>>>
>>> --------------------------------------
>>> Dario Strbenac
>>> Research Assistant
>>> Cancer Epigenetics
>>> Garvan Institute of Medical Research
>>> Darlinghurst NSW 2010
>>> Australia
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-devel mailing list