[Bioc-devel] is.unsorted method for GRanges objects

Michael Lawrence lawrence.michael at gene.com
Tue Nov 3 04:06:16 CET 2015


On Mon, Nov 2, 2015 at 6:39 PM, Hervé Pagès <hpages at fredhutch.org> wrote:

> Hi,
>
> @Pete:
>
> 2a- I would just compare pairs of adjacent elements, taking
> advantage of the fact that <= is vectorized and cheap. So something
> like:
>
>   setMethod("is.unsorted", "Vector",
>     function(x, na.rm=FALSE, strictly=FALSE)
>     {
>         if (length(x) <= 1L)
>             return(FALSE)
>         x1 <- head(x, n=-1)
>         x2 <- tail(x, n=-1)
>         if (strictly)
>             return(any(x1 >= x2))
>         any(x1 > x2)
>     }
>   )
>
> Since this will work on any Vector derivative for which <= and
> subsetting are defined, it's a good candidate for being the default
> "is.unsorted" method for Vector objects. I'll add it to S4Vectors.
>
> 2b- The semantic of is.unsorted() on a GRangesList object or any
> List object in general should be sapply(x, is.unsorted), for
> consistency with order(), sort(), etc:
>
>   > sort(IntegerList(4:3, 1:-2))
>   IntegerList of length 2
>   [[1]] 3 4
>   [[2]] -2 -1 0 1
>
> I'll add this too.
>
> 2c - That won't be needed. The default method for Vector objects will
> work on RangedSummarizedExperiment objects (<= and 1D subsetting are
> defined and along the same dimension).
>
> @Gabe:
>
> See ?`GenomicRanges-comparison` for how the order of genomic ranges
> is defined.
>
> @Michael:
>
> Calling base::.gt() in a loop sounds indeed very inefficient. What
> about having base::is.unsorted() do the above on "objects" instead?
> base::.gt() seems to also require that the object is subsettable so
> the requirements are the same.
>
> Then we wouldn't need the default "is.unsorted" method for Vector
> objects, only a default "anyNA" method for Vector objects that always
> returns FALSE (plus some specific ones for Rle and other Vector
> derivatives that support NAs).
>
>
Yea, I assumed it did what you suggested before looking at it. It would be
the right place to fix this.


> Thanks,
> H.
>
>
> On 11/02/2015 05:35 PM, Michael Lawrence wrote:
>
>> The notion of sortedness is already formally defined, which is why we have
>> an order method, etc.
>>
>> The base is.unsorted implementation for "objects" ends up calling
>> base::.gt() for each adjacent pair of elements, which is likely too slow
>> to
>> be practical, so we probably should add a custom method.
>>
>> This does bring up the tangential question of whether GenomicRanges should
>> have an anyNA method that returns FALSE (and similarly an is.na()
>> method),
>> although we have never defined the concept of a "missing range".
>>
>> Michael
>>
>> On Mon, Nov 2, 2015 at 4:55 PM, Gabe Becker <becker.gabe at gene.com> wrote:
>>
>> Pete,
>>>
>>> What does sorted mean for granges? If the starts  are sorted but the ends
>>> aren't does that count? What if only the ends are but the ranges are on
>>> the
>>> negative strand?
>>>
>>> Do we consider seqlevels to be ordinal in the order the levels are
>>> returned
>>> from seqlevels ()? That usually makes sense, but does it always?
>>>
>>> In essence I'm asking if sortedness is a well enough defined term for an
>>> is.sorted method to make sense.
>>>
>>> Best,
>>> ~G
>>> On Nov 2, 2015 4:27 PM, "Peter Hickey" <peter.hickey at gmail.com> wrote:
>>>
>>> Hi all,
>>>>
>>>> I sometimes want to test whether a GRanges object (or some object with
>>>> a GRanges slot, e.g., a SummarizedExperiment object) is (un)sorted.
>>>> There is no is.unsorted,GRanges-method or, rather, it defers to
>>>> is.unsorted,ANY-method. I'm unsure that deferring to the
>>>> is.unsorted,ANY-method is what is really desired when a user calls
>>>> is.unsorted on a GRanges object, and it will certainly return a
>>>> (possibly unrelated) warning - "In is.na(x) : is.na() applied to
>>>> non-(list or vector) of type 'S4'".
>>>>
>>>>
>>>> For this reason, I tend to use is.unsorted(order(x)) when x is a
>>>> GRanges object. This workaround is also used, for example, by minfi
>>>> (
>>>> https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R#L121
>>>>
>>> ).
>>>
>>>> However, this is slow because it essentially sorts the object to test
>>>> whether it is already sorted.
>>>>
>>>>
>>>> So, to my questions:
>>>>
>>>> 1. Have I overlooked a fast way to test whether a GRanges object is
>>>>
>>> sorted?
>>>
>>>> 2a. Could a is.unsorted,GenomicRanges-method be added to the
>>>> GenomicRanges package? Side note, I'm unsure at which level to define
>>>> this method, e.g., GRanges vs. GenomicRanges.
>>>> 2b. Is it possible to have a sensible definition and implementation
>>>> for is.unsorted,GRangesList-method?
>>>> 2c. Could a is.unsorted,RangedSummarizedExperiment-method be added to
>>>> the SummarizedExperiment package?
>>>>
>>>> I started working on a patch for 2a/2c, but wanted to ensure I hadn't
>>>> overlooked something obvious. Also, I'm sure 2a/2b/2c could be written
>>>> much more efficiently at the C-level but I'm afraid this might be a
>>>> bit beyond my abilities to integrate nicely with the existing code.
>>>>
>>>> Thanks,
>>>> Pete
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list