[Rd] multiple issues with is.unsorted()

Wed Apr 24 08:09:21 CEST 2013

Hi,

In the man page for is.unsorted():

   Value:

      A length-one logical value.  All objects of length 0 or 1 are
      sorted: the result will be ‘NA’ for objects of length 2 or more
      except for atomic vectors and objects with a class (where the ‘>=’
      or ‘>’ method is used to compare ‘x[i]’ with ‘x[i-1]’ for ‘i’ in
      ‘2:length(x)’).

This contains many incorrect statements:

      > length(NA)
      [1] 1
      > is.unsorted(NA)
      [1] NA
      > length(list(NA))
      [1] 1
      > is.unsorted(list(NA))
      [1] NA

=> Contradicts "all objects of length 0 or 1 are sorted".

      > is.unsorted(raw(2))
      Error in is.unsorted(raw(2)) : unimplemented type 'raw' in 
'isUnsorted'

=> Doesn't agree with the doc (unless "except for atomic vectors"
    means "it might fail for atomic vectors").

      > setClass("A", representation(aa="integer"))
      > a <- new("A", aa=4:1)
      > length(a)
      [1] 1

      > is.unsorted(a)
      [1] FALSE
      Warning message:
      In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'

=> Ok, but it's arguable the warning is useful/justified from a user
    point of view. The warning *seems* to suggest that defining an
    "is.na" method for my objects is required for is.unsorted() to
    work properly but the doc doesn't make this clear.

Anyway, let's define one, so the warning goes away:

      > setMethod("is.na", "A", function(x) is.na(x at aa))
      [1] "is.na"

Let's define a "length" method:

      > setMethod("length", "A", function(x) length(x at aa))
      [1] "length"
      > length(a)
      [1] 4

      > is.unsorted(a)
      [1] FALSE

=> Is this correct? Hard to know. The doc is not clear about what
    should happen for objects of length 2 or more and with a class
    but with no ">=" or ">" methods.

Let's define "[", ">=", and ">":

      > setMethod("[", "A", function(x, i, j, ..., drop=TRUE) new("A", 
aa=x at aa[i]))
      [1] "["
      > rev(a)
      An object of class "A"
      Slot "aa":
      [1] 1 2 3 4

      > setMethod(">=", c("A", "A"), function(e1, e2) {e1 at aa >= e2 at aa})
      [1] ">="
      > a >= a[3]
      [1]  TRUE  TRUE  TRUE FALSE

      > setMethod(">", c("A", "A"), function(e1, e2) {e1 at aa > e2 at aa})
      [1] ">"
      > a > a[3]
      [1]  TRUE  TRUE FALSE FALSE

      > is.unsorted(a)
      [1] FALSE

     > is.unsorted(rev(a))
     [1] FALSE

Still not working as expected. So what's required exactly for making
is.unsorted() work on an object "with a class"?

BTW, is.unsorted() would be *much* faster, at least on atomic vectors,
without those calls to is.na(). The C code could check for NAs, without
having to do this as a first pass on the full vector like it is the
case with the current implementation. If the vector if unsorted, the
C code is typically able to bail out early so the speed-up will
typically be 10000x or more if the vector as millions of elements.

Thanks,
H.

 > sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.0

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319