[Bioc-devel] GRanges Unique [actually -- `order`] Method

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 15 14:33:27 CEST 2011


On 06/15/2011 04:26 AM, Michael Lawrence wrote:
> Thanks for looking into this Steve. Maybe I am missing something here, but
> why not just do something like:
>
> order(as.factor(seqnames(gr)), as.factor(strand(gr)), start(gr))
>
> I think we'd want an option for including strand or not.

like nearest,GenomicRanges,GenomicRanges, which has ignore.strand=FALSE. 
For GRangesList maybe an easy approach is to add a first argument 
order(rep(seq_along(grl), elementLengths(grl)), ...) then unlist, order, 
and re-list.

Also not a fan of allowing the user to specify seqnames.order; you can't 
do this for factors, and sounds really like the user wants seqlevels(gr) 
<- ...

Martin
>
> Thanks again,
> Michael
>
> On Tue, Jun 14, 2011 at 10:17 PM, Steve Lianoglou<
> mailinglist.honeypot at gmail.com>  wrote:
>
>> I took another crack at my original attempt and reduced a call to my
>> GenomicRanges::order from ~ 22 seconds to ~ 5.5 seconds over 1 million
>> randomly picked ranges over hsapiens.
>>
>> Still not super fast, but not as abysmal as before.
>>
>> I'll put it here for review before checking in (or not):
>> https://gist.github.com/1026520
>>
>> Thanks,
>> -steve
>>
>> On Tue, Jun 14, 2011 at 8:06 PM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com>  wrote:
>>> Hi,
>>>
>>> (Digging up an old [related] thread since I'm not sure of the status
>>> of the code that Michael referred to in this context is ...)
>>>
>>> I have a suboptimal-but-working implementation of `order` (and by
>>> extension, `sort`) for GenomicRanges objects, eg. it calculates the
>>> `order`ing of a GRanges object of length 1 million (randomly spread
>>> across all Hsapiens chromosomes and strands) in ~ 22 seconds[*].
>>>
>>> The resulting/ordered ranges are sorted/grouped by
>>> seqnames,strand,ranges (the caller can specify the ordering of the
>>> seqnames, otherwise the ordering as defined by
>>> seqleves(your.granges.object) is used.
>>>
>>> Also it is only defined for one GRanges object (not sure what the
>>> appropriate result would be if multiple granges objects are passed in)
>>>
>>> I can check it into SVN if that sounds good so it can work as a
>>> stop-gap until one of the *Ranges-guru's can whip up a superior one.
>>>
>>> [*] By the by, the runtime is dominated by iterating over the seqnames
>>> and subselecting the appropriate ranges to work for one at a time ...
>>> maybe the speed can be increased by using `split` a few times, but
>>> then you have several copies of your GRanges object in memory, so ...
>>> not sure what's best atm or how useful it is to talk about code in the
>>> "abstract," but we can continue the discussion if you reckon it's
>>> worthy to be checked in for now ...
>>>
>>> -steve
>>>
>>> On Wed, May 25, 2011 at 9:02 AM, Michael Lawrence
>>> <lawrence.michael at gene.com>  wrote:
>>>> Someone has to write the methods...
>>>>
>>>> On Tue, May 24, 2011 at 11:00 PM, Dario Strbenac
>>>> <D.Strbenac at garvan.org.au>wrote:
>>>>
>>>>>>    Yes, the sort method just calls order.
>>>>>
>>>>> Something isn't quite working out for me.
>>>>>
>>>>> library(GenomicRanges) # 1.4.5
>>>>> gr<- GRanges("chr1", IRanges(c(1, 10), c(50, 60)), '+')
>>>>> sort(gr)
>>>>>
>>>>> --------------------------------------
>>>>> Dario Strbenac
>>>>> Research Assistant
>>>>> Cancer Epigenetics
>>>>> Garvan Institute of Medical Research
>>>>> Darlinghurst NSW 2010
>>>>> Australia
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>   | Memorial Sloan-Kettering Cancer Center
>>>   | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>   | Memorial Sloan-Kettering Cancer Center
>>   | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list