[Rd] Couldn't (and shouldn't) is.unsorted() be faster?
Herve Pages
hpages at fhcrc.org
Fri Apr 18 01:35:17 CEST 2008
Hi,
Thanks for your answers!
No need to change anything. In my case, 'x' is guaranteed to be an integer
vector with no NAs so I can call .Internal(is.unsorted(x)) directly.
BTW, why not make is.unsorted() a little bit more prepared to silly user
input:
> is.unsorted(c(2:5, NA), na.rm=NA)
Error in if (!is.atomic(x) || (!na.rm && any(is.na(x)))) return(NA) :
missing value where TRUE/FALSE needed
(or at least silently coerce those silly values to TRUE or FALSE like
max()/min() do, following some obscure logic though).
Also it's arguable that a length-1 vector cannot be considered sorted:
> is.unsorted(NA)
[1] NA
Cheers,
H.
Prof Brian Ripley wrote:
> I wouldn't say 'easy', and so I think we need a business case for this
> change. (One of the issues is that the internals are used elsewhere and
> optimized for inputs without NAs. So we would need to write separate
> code if we pass NAs down to C level. As I recall, is.unsorted was a
> cheap R interface to existing C code.)
>
> What real-world problems are being affected by this, and what would be
> proportional speedup of the whole analysis be from this change?
>
> On Thu, 17 Apr 2008, Bill Dunlap wrote:
>
>> On Thu, 17 Apr 2008, Herve Pages wrote:
>>
>>> Couldn't is.unsorted() bail out immediately here (after comparing
>>> the first 2 elements):
>>>
>>> > x <- 20000000:1
>>> > system.time(is.unsorted(x), gcFirst=TRUE)
>>> user system elapsed
>>> 0.084 0.040 0.124
>>>
>>> > x <- 200000000:1
>>> > system.time(is.unsorted(x), gcFirst=TRUE)
>>> user system elapsed
>>> 0.772 0.440 1.214
>>
>> The C code does bail out upon seeing the first out- of-order pair, but
>> before calling the C code, the S code does any(is.na(x)), forcing a
>> scan of the entire data. If you remove the is.na calls from
>> is.unsorted's S code you will see the timings improve in your example.
>> (It looks easy to do the NA checks in the C code.)
>>
>> is.unsorted.no.nacheck <- function (x, na.rm = FALSE) {
>> if (is.null(x))
>> return(FALSE)
>> if (!is.atomic(x))
>> return(NA)
>> .Internal(is.unsorted(x))
>> }
>> > x <- 20000000:1
>> > system.time(is.unsorted(x), gcFirst=TRUE)
>> user system elapsed
>> 0.356 0.157 0.514
>> > system.time(is.unsorted.no.nacheck(x), gcFirst=TRUE)
>> user system elapsed
>> 0 0 0
>> > revx <- rev(x)
>> > system.time(is.unsorted(revx), gcFirst=TRUE)
>> user system elapsed
>> 0.500 0.170 0.672
>> > system.time(is.unsorted.no.nacheck(revx),gcFirst=TRUE)
>> user system elapsed
>> 0.131 0.000 0.132
>>
>> ----------------------------------------------------------------------------
>>
>> Bill Dunlap
>> Insightful Corporation
>> bill at insightful dot com
>> 360-428-8146
>>
>> "All statements in this message represent the opinions of the author
>> and do
>> not necessarily reflect Insightful Corporation policy or position."
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
More information about the R-devel
mailing list