[Bioc-devel] is.unsorted method for GRanges objects

Hervé Pagès hpages at fredhutch.org
Tue Nov 3 03:39:09 CET 2015


Hi,

@Pete:

2a- I would just compare pairs of adjacent elements, taking
advantage of the fact that <= is vectorized and cheap. So something
like:

   setMethod("is.unsorted", "Vector",
     function(x, na.rm=FALSE, strictly=FALSE)
     {
         if (length(x) <= 1L)
             return(FALSE)
         x1 <- head(x, n=-1)
         x2 <- tail(x, n=-1)
         if (strictly)
             return(any(x1 >= x2))
         any(x1 > x2)
     }
   )

Since this will work on any Vector derivative for which <= and
subsetting are defined, it's a good candidate for being the default
"is.unsorted" method for Vector objects. I'll add it to S4Vectors.

2b- The semantic of is.unsorted() on a GRangesList object or any
List object in general should be sapply(x, is.unsorted), for
consistency with order(), sort(), etc:

   > sort(IntegerList(4:3, 1:-2))
   IntegerList of length 2
   [[1]] 3 4
   [[2]] -2 -1 0 1

I'll add this too.

2c - That won't be needed. The default method for Vector objects will
work on RangedSummarizedExperiment objects (<= and 1D subsetting are
defined and along the same dimension).

@Gabe:

See ?`GenomicRanges-comparison` for how the order of genomic ranges
is defined.

@Michael:

Calling base::.gt() in a loop sounds indeed very inefficient. What
about having base::is.unsorted() do the above on "objects" instead?
base::.gt() seems to also require that the object is subsettable so
the requirements are the same.

Then we wouldn't need the default "is.unsorted" method for Vector
objects, only a default "anyNA" method for Vector objects that always
returns FALSE (plus some specific ones for Rle and other Vector
derivatives that support NAs).

Thanks,
H.

On 11/02/2015 05:35 PM, Michael Lawrence wrote:
> The notion of sortedness is already formally defined, which is why we have
> an order method, etc.
>
> The base is.unsorted implementation for "objects" ends up calling
> base::.gt() for each adjacent pair of elements, which is likely too slow to
> be practical, so we probably should add a custom method.
>
> This does bring up the tangential question of whether GenomicRanges should
> have an anyNA method that returns FALSE (and similarly an is.na() method),
> although we have never defined the concept of a "missing range".
>
> Michael
>
> On Mon, Nov 2, 2015 at 4:55 PM, Gabe Becker <becker.gabe at gene.com> wrote:
>
>> Pete,
>>
>> What does sorted mean for granges? If the starts  are sorted but the ends
>> aren't does that count? What if only the ends are but the ranges are on the
>> negative strand?
>>
>> Do we consider seqlevels to be ordinal in the order the levels are returned
>> from seqlevels ()? That usually makes sense, but does it always?
>>
>> In essence I'm asking if sortedness is a well enough defined term for an
>> is.sorted method to make sense.
>>
>> Best,
>> ~G
>> On Nov 2, 2015 4:27 PM, "Peter Hickey" <peter.hickey at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I sometimes want to test whether a GRanges object (or some object with
>>> a GRanges slot, e.g., a SummarizedExperiment object) is (un)sorted.
>>> There is no is.unsorted,GRanges-method or, rather, it defers to
>>> is.unsorted,ANY-method. I'm unsure that deferring to the
>>> is.unsorted,ANY-method is what is really desired when a user calls
>>> is.unsorted on a GRanges object, and it will certainly return a
>>> (possibly unrelated) warning - "In is.na(x) : is.na() applied to
>>> non-(list or vector) of type 'S4'".
>>>
>>>
>>> For this reason, I tend to use is.unsorted(order(x)) when x is a
>>> GRanges object. This workaround is also used, for example, by minfi
>>> (https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R#L121
>> ).
>>> However, this is slow because it essentially sorts the object to test
>>> whether it is already sorted.
>>>
>>>
>>> So, to my questions:
>>>
>>> 1. Have I overlooked a fast way to test whether a GRanges object is
>> sorted?
>>> 2a. Could a is.unsorted,GenomicRanges-method be added to the
>>> GenomicRanges package? Side note, I'm unsure at which level to define
>>> this method, e.g., GRanges vs. GenomicRanges.
>>> 2b. Is it possible to have a sensible definition and implementation
>>> for is.unsorted,GRangesList-method?
>>> 2c. Could a is.unsorted,RangedSummarizedExperiment-method be added to
>>> the SummarizedExperiment package?
>>>
>>> I started working on a patch for 2a/2c, but wanted to ensure I hadn't
>>> overlooked something obvious. Also, I'm sure 2a/2b/2c could be written
>>> much more efficiently at the C-level but I'm afraid this might be a
>>> bit beyond my abilities to integrate nicely with the existing code.
>>>
>>> Thanks,
>>> Pete
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list