[Rd] multiple issues with is.unsorted()
Martin Maechler
maechler at stat.math.ethz.ch
Wed Apr 24 11:29:39 CEST 2013
Dear Herve,
>>>>> Hervé Pagès <hpages at fhcrc.org>
>>>>> on Tue, 23 Apr 2013 23:09:21 -0700 writes:
> Hi, In the man page for is.unsorted():
> Value:
> A length-one logical value. All objects of length 0
> or 1 are sorted: the result will be ‘NA’ for objects of
> length 2 or more except for atomic vectors and objects
> with a class (where the ‘>=’ or ‘>’ method is used to
> compare ‘x[i]’ with ‘x[i-1]’ for ‘i’ in ‘2:length(x)’).
> This contains many incorrect statements:
>> length(NA)
> [1] 1
>> is.unsorted(NA)
> [1] NA
>> length(list(NA))
> [1] 1
>> is.unsorted(list(NA))
> [1] NA
> => Contradicts "all objects of length 0 or 1 are sorted".
>> is.unsorted(raw(2))
> Error in is.unsorted(raw(2)) : unimplemented type
> 'raw' in 'isUnsorted'
> => Doesn't agree with the doc (unless "except for atomic
> vectors" means "it might fail for atomic vectors").
>> setClass("A", representation(aa="integer")) a <- new("A",
>> aa=4:1) length(a)
> [1] 1
>> is.unsorted(a)
> [1] FALSE Warning message: In is.na(x) : is.na()
> applied to non-(list or vector) of type 'S4'
> => Ok, but it's arguable the warning is useful/justified
> from a user point of view. The warning *seems* to suggest
> that defining an "is.na" method for my objects is required
> for is.unsorted() to work properly but the doc doesn't
> make this clear.
> Anyway, let's define one, so the warning goes away:
>> setMethod("is.na", "A", function(x) is.na(x at aa))
> [1] "is.na"
> Let's define a "length" method:
>> setMethod("length", "A", function(x) length(x at aa))
> [1] "length"
>> length(a)
> [1] 4
>> is.unsorted(a)
> [1] FALSE
> => Is this correct? Hard to know. The doc is not clear
> about what should happen for objects of length 2 or more
> and with a class but with no ">=" or ">" methods.
> Let's define "[", ">=", and ">":
>> setMethod("[", "A", function(x, i, j, ..., drop=TRUE)
>> new("A",
> aa=x at aa[i])) [1] "["
>> rev(a)
> An object of class "A" Slot "aa": [1] 1 2 3 4
>> setMethod(">=", c("A", "A"), function(e1, e2) {e1 at aa >=
>> e2 at aa})
> [1] ">="
>> a >= a[3]
> [1] TRUE TRUE TRUE FALSE
>> setMethod(">", c("A", "A"), function(e1, e2) {e1 at aa >
>> e2 at aa})
> [1] ">"
>> a > a[3]
> [1] TRUE TRUE FALSE FALSE
>> is.unsorted(a)
> [1] FALSE
>> is.unsorted(rev(a))
> [1] FALSE
> Still not working as expected. So what's required exactly
> for making is.unsorted() work on an object "with a class"?
well, read the source code. :-) ;-)
More seriously: On another hidden help page, you find
\code{.gt} and \code{.gtn} are callbacks from \code{\link{rank}} and
\code{\link{is.unsorted}} used for classed objects.
In other words, you'd need do define a method for
.gtn for S4 objects in this case.
.... yes, indeed I don't know why this is not at all documented.
> BTW, is.unsorted() would be *much* faster, at least on
> atomic vectors, without those calls to is.na().
Well, in all R versions, apart from R-devel as of yesterday,
the source of is.unsorted() has been
is.unsorted <- function(x, na.rm = FALSE, strictly = FALSE)
{
if(is.null(x)) return(FALSE)
if(!na.rm && any(is.na(x)))## "FIXME" is.na(<large>) is "too slow"
return(NA)
## else
if(na.rm && any(ii <- is.na(x)))
x <- x[!ii]
.Internal(is.unsorted(x, strictly))
}
so you see the "FIXME".
In R-devel (and probably R-patched in the nearer future),
that line is
if(!na.rm && anyMissing(x))
so there's no slow code anymore, at least not for the default
case of na.rm = FALSE.
> The C code
> could check for NAs, without having to do this as a first
> pass on the full vector like it is the case with the
> current implementation. If the vector if unsorted, the C
> code is typically able to bail out early so the speed-up
> will typically be 10000x or more if the vector as millions
> of elements.
you are right (but again: the most important case na.rm=FALSE
case has been "solved" already I'd say),
but you know well that we do gratefully accept good patches to
the R sources.
> Thanks, H.
>> sessionInfo()
> R version 3.0.0 (2013-04-03) Platform:
> x86_64-unknown-linux-gnu (64-bit)
> locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3]
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5]
> LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7]
> LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11]
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages: [1] stats graphics grDevices utils
> datasets methods base
> loaded via a namespace (and not attached): [1] tools_3.0.0
> --
> Hervé Pagès
> Program in Computational Biology Division of Public Health
> Sciences Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
> 98109-1024
> E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206)
> 667-1319
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list