[Bioc-devel] Fast check of GenomicRanges equality to speed up cbind, SummarizedExperiment

Peter Hickey peter.hickey at gmail.com
Wed Aug 31 03:15:59 CEST 2016


Wonderful. Thanks, Hervé!

On 30 August 2016 at 20:45, Hervé Pagès <hpages at fredhutch.org> wrote:
> Hi Pete,
>
> Thanks for suggesting this fast method. I've formalized this a little
> bit by using a generic (identicalVals) + methods. I also tweaked it
> in order to avoid false negatives that can occur when 'x' and 'y' have
> different names or different seqlevels. So no more fallback to
> 'all(x == y)'.
>
> Committed in SummarizedExperiment 1.3.82.
>
> BTW please note that 'x == y' and 'identicalVals(x, y)' both ignore
> circularity of the underlying sequences e.g. ranges [1, 10] and
> [101, 110] represent the same position on a circular sequence of
> length 100 so should be considered equal. However for 'x == y' and
> 'identicalVals(x, y)', they are not. Something we should address at
> some point...
>
> Cheers,
> H.
>
>
> On 08/30/2016 05:57 AM, Peter Hickey wrote:
>>
>> The cbind,SummarizedExperiment-method checks that the rowRanges slots
>> are equal by calling `all(x == x1)`, where x and x1 are GenomicRanges
>> objects. This can be kind of slow and makes a large, temporary vector
>> when length(x) is large.
>>
>> I wrote a fast method to check equality of two GenomicRanges objects,
>> see https://gist.github.com/PeteHaitch/13787125a165928e652dcfea2a8d166a.
>> It takes it from 13.7 seconds to 0.004 seconds for a GenomicRanges
>> object with 100M elements on my machine. It uses identical() on key
>> slots of the GenomicRanges objects, and I'm not sure if this could
>> return false negatives, so I fall back to all(x == x1) if the fast
>> method returns FALSE.
>>
>> Could cbind,SummarizedExperiment-method be updated to use something like
>> this?
>>
>> Cheers,
>> Pete
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



More information about the Bioc-devel mailing list