[Bioc-devel] RFC: IntervalTrees for GRanges objects

Laurent Gautier lgautier at gmail.com
Wed Apr 3 21:17:45 CEST 2013


On 2013-04-03 19:28, Kasper Daniel Hansen wrote:
> Making IntervalTree chromosome would also be a great addition for organisms
> with many sequences, like bee (due to an incomplete genome; 10,000s of
> sequences).  It does not matter for humans, but findOverlaps is
> excruciatingly slow for bee's.  I have a couple of posts on this in the
> archive.
>
> I am predicting this will be the case in the future for most non-model
> organisms; finishing a genome is expensive and time consuming.

I can only confirm: many many people do not close genomes, because for a 
number of applications this is just not worth the trouble... for now.
Given the number of cases I have seen, that's now concerning the very 
large majority of genomes.

L.
>
> Kasper
>
>
> On Wed, Apr 3, 2013 at 12:45 PM, Michael Lawrence <lawrence.michael at gene.com
>> wrote:
>> Some ideas:
>>
>> - Turn the IntervalTree into a list/array of nodes that can be
>> subset/reordered with shallow copying (just copy the pointers to the
>> nodes), and the index would be secondary. The index in the array could be
>> stored in each node, for lookup during overlap queries. Right now, as far
>> as I can tell, GIntervalTree will get confused if the user reorders e.g.
>> via [.
>>
>> - Make IntervalTree aware of the sequence/chromosome, e.g., have a hash of
>> trees, which is trivial since seqnames is already a factor.
>>
>> Michael
>>
>>
>>
>> On Wed, Apr 3, 2013 at 9:29 AM, Hector Corrada Bravo <hcorrada at gmail.com
>>> wrote:
>>> Yep, I didn't comment on that, but I agree that abstracting how
>>> GRanges stores ranges would make this more elegant. Right now
>>> ranges(GRanges) is specified to be of IRanges class instead of the
>>> abstract Ranges class.
>>>
>>> If it were the latter then GIntervalTree can be a subclass of
>>> GenomicRanges, in a similar way that IntervalTree is a subclass of
>>> Ranges.
>>>
>>> On Wed, Apr 3, 2013 at 12:23 PM, Michael Lawrence
>>> <lawrence.michael at gene.com> wrote:
>>>> Hi Hector,
>>>>
>>>> That's interesting, thanks for passing this along. I'm still wishing
>> that
>>>> somehow GRanges itself could abstract the way it stores ranges. I know
>>> that
>>>> Herve/Patrick had some reasons for depending specifically on GRanges.
>> One
>>>> reason was probably convenience at the C level, but it wouldn't be hard
>>> to
>>>> create a Ranges abstraction at the C level, as well.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> On Tue, Apr 2, 2013 at 5:40 PM, Hector Corrada Bravo <
>> hcorrada at gmail.com
>>>> wrote:
>>>>> Hello bioc-develers,
>>>>>
>>>>> I'm writing an application where lots findOverlap calls are made on
>>>>> static GRanges objects. For IRanges we can create persistent
>>>>> IntervalTree objects that would serve the multiple overlap query
>>>>> use-case. There is no equivalent for GenomicRanges objects, so I'm
>>>>> proposing an implementation for this.
>>>>>
>>>>> Please check
>>>>> http://github.com/hcorrada/GenomicIntervalTree
>>>>>
>>>>> There's a first cut implementation there you can test by installing
>>>>> this skeleton package. E.g,
>>>>>
>>>>>> library(devtools)
>>>>>> install_github("GenomicIntervalTree", username="hcorrada",
>>> subdir="pkg")
>>>>>> library(GenomicIntervalTree)
>>>>> Let me know what you think.
>>>>>
>>>>> Cheers,
>>>>> Hector
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list