[Bioc-devel] is.unsorted method for GRanges objects

Hervé Pagès hpages at fredhutch.org
Tue Nov 3 03:39:09 CET 2015



2a- I would just compare pairs of adjacent elements, taking
advantage of the fact that <= is vectorized and cheap. So something

   setMethod("is.unsorted", "Vector",
     function(x, na.rm=FALSE, strictly=FALSE)
         if (length(x) <= 1L)
         x1 <- head(x, n=-1)
         x2 <- tail(x, n=-1)
         if (strictly)
             return(any(x1 >= x2))
         any(x1 > x2)

Since this will work on any Vector derivative for which <= and
subsetting are defined, it's a good candidate for being the default
"is.unsorted" method for Vector objects. I'll add it to S4Vectors.

2b- The semantic of is.unsorted() on a GRangesList object or any
List object in general should be sapply(x, is.unsorted), for
consistency with order(), sort(), etc:

   > sort(IntegerList(4:3, 1:-2))
   IntegerList of length 2
   [[1]] 3 4
   [[2]] -2 -1 0 1

I'll add this too.

2c - That won't be needed. The default method for Vector objects will
work on RangedSummarizedExperiment objects (<= and 1D subsetting are
defined and along the same dimension).


See ?`GenomicRanges-comparison` for how the order of genomic ranges
is defined.


Calling base::.gt() in a loop sounds indeed very inefficient. What
about having base::is.unsorted() do the above on "objects" instead?
base::.gt() seems to also require that the object is subsettable so
the requirements are the same.

Then we wouldn't need the default "is.unsorted" method for Vector
objects, only a default "anyNA" method for Vector objects that always
returns FALSE (plus some specific ones for Rle and other Vector
derivatives that support NAs).


On 11/02/2015 05:35 PM, Michael Lawrence wrote:
> The notion of sortedness is already formally defined, which is why we have
> an order method, etc.
> The base is.unsorted implementation for "objects" ends up calling
> base::.gt() for each adjacent pair of elements, which is likely too slow to
> be practical, so we probably should add a custom method.
> This does bring up the tangential question of whether GenomicRanges should
> have an anyNA method that returns FALSE (and similarly an is.na() method),
> although we have never defined the concept of a "missing range".
> Michael
> On Mon, Nov 2, 2015 at 4:55 PM, Gabe Becker <becker.gabe at gene.com> wrote:
>> Pete,
>> What does sorted mean for granges? If the starts  are sorted but the ends
>> aren't does that count? What if only the ends are but the ranges are on the
>> negative strand?
>> Do we consider seqlevels to be ordinal in the order the levels are returned
>> from seqlevels ()? That usually makes sense, but does it always?
>> In essence I'm asking if sortedness is a well enough defined term for an
>> is.sorted method to make sense.
>> Best,
>> ~G
>> On Nov 2, 2015 4:27 PM, "Peter Hickey" <peter.hickey at gmail.com> wrote:
>>> Hi all,
>>> I sometimes want to test whether a GRanges object (or some object with
>>> a GRanges slot, e.g., a SummarizedExperiment object) is (un)sorted.
>>> There is no is.unsorted,GRanges-method or, rather, it defers to
>>> is.unsorted,ANY-method. I'm unsure that deferring to the
>>> is.unsorted,ANY-method is what is really desired when a user calls
>>> is.unsorted on a GRanges object, and it will certainly return a
>>> (possibly unrelated) warning - "In is.na(x) : is.na() applied to
>>> non-(list or vector) of type 'S4'".
>>> For this reason, I tend to use is.unsorted(order(x)) when x is a
>>> GRanges object. This workaround is also used, for example, by minfi
>>> (https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R#L121
>> ).
>>> However, this is slow because it essentially sorts the object to test
>>> whether it is already sorted.
>>> So, to my questions:
>>> 1. Have I overlooked a fast way to test whether a GRanges object is
>> sorted?
>>> 2a. Could a is.unsorted,GenomicRanges-method be added to the
>>> GenomicRanges package? Side note, I'm unsure at which level to define
>>> this method, e.g., GRanges vs. GenomicRanges.
>>> 2b. Is it possible to have a sensible definition and implementation
>>> for is.unsorted,GRangesList-method?
>>> 2c. Could a is.unsorted,RangedSummarizedExperiment-method be added to
>>> the SummarizedExperiment package?
>>> I started working on a patch for 2a/2c, but wanted to ensure I hadn't
>>> overlooked something obvious. Also, I'm sure 2a/2b/2c could be written
>>> much more efficiently at the C-level but I'm afraid this might be a
>>> bit beyond my abilities to integrate nicely with the existing code.
>>> Thanks,
>>> Pete
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>          [[alternative HTML version deleted]]
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list