[Rd] Problem with order() and I()

Martin Maechler maechler at stat.math.ethz.ch
Tue Sep 9 18:35:38 CEST 2014


>>>>> peter dalgaard <pdalgd at gmail.com>
>>>>>     on Tue, 9 Sep 2014 16:36:19 +0200 writes:

    > It's actually a little more complicated. I wrote a note, but it seems to be stuck in the outbox on my home machine (I probably forgot to click Send...). 
    > One important aspect is that

    >> "x" < "\265g"
    > [1] NA

    > which makes me wonder if the bug really is in the case that "works". It seems that it is possible to rank() character vectors that contain incomparable elements.

    > -pd

yes you are right that it is even more complicated.
In both cases, our Scollate() is involved,
(Scollate: the one where we had a discussion about making it part of the C
 level R API, which would help package authors ..)

After

  ch <- c('x','\265g')
  foo <- I(ch)

Of the four expressions,

  order(ch)
  order(foo)
  ch [1] < ch [2]
  foo[1] < foo[2]

only the first one "works", the others give NA or an error because of NA
and the first one is the only of the 4 that does not use
do_relop_dflt()

It's not even clear what we'd want (as I think  pd also alluded to):
Ideally all of these should work consistently, which because of
 "<(.,.)" returning NA in both cases,
would mean that order(ch) also should give an error as order(foo)
    {{ an error we should improve the message in any case!!}.
Big Q:  Can we afford  order(ch)  giving an error in such cases.
Pretty high chance that this will "break" much user (and probably
even package) code out there.

Still, the other solution, namely  order(foo) behaving as
order(ch) now does would correspond to the ">" giving FALSE
instead of NA, so this solution is not ok in my view.

Martin


    > On 09 Sep 2014, at 16:19 , Martin Maechler <maechler at stat.math.ethz.ch> wrote:

    >>>>>>> MacQueen, Don <macqueen1 at llnl.gov>
    >>>>>>> on Mon, 8 Sep 2014 16:06:21 +0000 writes:
    >> 
    >>> I have found that order() fails in a rather arcane circumstance, as in
    >>> this example:
    >> 
    >>>> foo <- I( c('x','\265g') )
    >>>> order(foo)
    >>> Error in if (xi > xj) 1L else -1L : missing value where TRUE/FALSE needed
    >> 
    >>>> foo <-c('x','\265g')
    >>>> order(foo)
    >>> [1] 1 2
    >> 
    >> yes, this is not desirable.
    >> order() in such cases calls xtfrm()  {as documented}
    >> and that ends up calling rank() and then the internal  .gt()
    >> where the bug happens because
    >> 
    >>> I("x") > I("\xb5g")
    >> [1] NA
    >> 
    >> but really I think the change should happen in xtfrm.Asis(.)
    >> which I think should drop the class also in this case.
    >> 
    >> More on this, once we have fixed it.
    >> 
    >> Thank you, Don, very much!
    >> 
    >> Martin Maechler,
    >> ETH Zurich
    >> 
    >>>> sessionInfo()
    >>> R version 3.1.1 (2014-07-10)
    >>> Platform: x86_64-apple-darwin13.1.0 (64-bit)
    >> 
    >>> locale:
    >>> [1] C
    >> 
    >>> attached base packages:
    >>> [1] stats     graphics  grDevices utils     datasets  methods   base
    >> 
    >>> Thanks
    >>> -Don
    >> 
    >>> p.s.
    >>> Just a little background, irrelevant unless one wonders why I¹m using I()
    >>> and \265:
    >> 
    >>> If I were writing new code I wouldn¹t be using I(), since there are better
    >>> ways now to achieve the same end (preventing the creation of factors in
    >>> data frames), but the scripts that use it are quite old,  originally
    >>> developed in 2001.
    >> 
    >>> In at least some but perhaps limited contexts, Œ\265¹ produces the greek
    >>> letter mu, and that¹s why I¹m using it. And if I remember correctly, 2001
    >>> is prior to the current R support for locales and extended character sets.
    >>> Using \265 is what I could find at that time to get a mu into my output.
    >> 
    >>> I came across this while checking some things; it¹s not actually breaking
    >>> my scripts, so I doubt it¹s due to any recent change.
    >> 
    >> 
    >>> -- 
    >>> Don MacQueen
    >> 
    >>> Lawrence Livermore National Laboratory
    >>> 7000 East Ave., L-627
    >>> Livermore, CA 94550
    >>> 925-423-1062
    >> 
    >>> ______________________________________________
    >>> R-devel at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > -- 
    > Peter Dalgaard, Professor,
    > Center for Statistics, Copenhagen Business School
    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501
    > Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list