[Bioc-devel] Fast check of GenomicRanges equality to speed up cbind, SummarizedExperiment

Hervé Pagès hpages at fredhutch.org
Wed Aug 31 02:45:24 CEST 2016


Hi Pete,

Thanks for suggesting this fast method. I've formalized this a little
bit by using a generic (identicalVals) + methods. I also tweaked it
in order to avoid false negatives that can occur when 'x' and 'y' have
different names or different seqlevels. So no more fallback to
'all(x == y)'.

Committed in SummarizedExperiment 1.3.82.

BTW please note that 'x == y' and 'identicalVals(x, y)' both ignore
circularity of the underlying sequences e.g. ranges [1, 10] and
[101, 110] represent the same position on a circular sequence of
length 100 so should be considered equal. However for 'x == y' and
'identicalVals(x, y)', they are not. Something we should address at
some point...

Cheers,
H.

On 08/30/2016 05:57 AM, Peter Hickey wrote:
> The cbind,SummarizedExperiment-method checks that the rowRanges slots
> are equal by calling `all(x == x1)`, where x and x1 are GenomicRanges
> objects. This can be kind of slow and makes a large, temporary vector
> when length(x) is large.
>
> I wrote a fast method to check equality of two GenomicRanges objects,
> see https://gist.github.com/PeteHaitch/13787125a165928e652dcfea2a8d166a.
> It takes it from 13.7 seconds to 0.004 seconds for a GenomicRanges
> object with 100M elements on my machine. It uses identical() on key
> slots of the GenomicRanges objects, and I'm not sure if this could
> return false negatives, so I fall back to all(x == x1) if the fast
> method returns FALSE.
>
> Could cbind,SummarizedExperiment-method be updated to use something like this?
>
> Cheers,
> Pete
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list