[Rd] Problem with order() and I()

Wed Sep 10 17:19:21 CEST 2014

Early on I had been wondering if deprecating I() and the AsIs class would
be a way to get the problem to go away. I imagine (based on no data at
all!) that they are rarely used. If I were writing the same code today, I
would use options(stringsAsFactors=FALSE) instead of sprinkling I() here
and there throughout my scripts.

But I see from the discussions that there’s something deeper going on.

Thanks for continuing to cc me; I find it interesting.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 9/9/14, 9:35 AM, "Martin Maechler" <maechler at stat.math.ethz.ch> wrote:

>>>>>> peter dalgaard <pdalgd at gmail.com>
>>>>>>     on Tue, 9 Sep 2014 16:36:19 +0200 writes:
>
>    > It's actually a little more complicated. I wrote a note, but it
>seems to be stuck in the outbox on my home machine (I probably forgot to
>click Send...). 
>    > One important aspect is that
>
>    >> "x" < "\265g"
>    > [1] NA
>
>    > which makes me wonder if the bug really is in the case that
>"works". It seems that it is possible to rank() character vectors that
>contain incomparable elements.
>
>    > -pd
>
>yes you are right that it is even more complicated.
>In both cases, our Scollate() is involved,
>(Scollate: the one where we had a discussion about making it part of the C
> level R API, which would help package authors ..)
>
>After
>
>  ch <- c('x','\265g')
>  foo <- I(ch)
>
>Of the four expressions,
>
>  order(ch)
>  order(foo)
>  ch [1] < ch [2]
>  foo[1] < foo[2]
>
>only the first one "works", the others give NA or an error because of NA
>and the first one is the only of the 4 that does not use
>do_relop_dflt()
>
>It's not even clear what we'd want (as I think  pd also alluded to):
>Ideally all of these should work consistently, which because of
> "<(.,.)" returning NA in both cases,
>would mean that order(ch) also should give an error as order(foo)
>    {{ an error we should improve the message in any case!!}.
>Big Q:  Can we afford  order(ch)  giving an error in such cases.
>Pretty high chance that this will "break" much user (and probably
>even package) code out there.
>
>Still, the other solution, namely  order(foo) behaving as
>order(ch) now does would correspond to the ">" giving FALSE
>instead of NA, so this solution is not ok in my view.
>
>Martin
>
>
>    > On 09 Sep 2014, at 16:19 , Martin Maechler
><maechler at stat.math.ethz.ch> wrote:
>
>    >>>>>>> MacQueen, Don <macqueen1 at llnl.gov>
>    >>>>>>> on Mon, 8 Sep 2014 16:06:21 +0000 writes:
>    >> 
>    >>> I have found that order() fails in a rather arcane circumstance,
>as in
>    >>> this example:
>    >> 
>    >>>> foo <- I( c('x','\265g') )
>    >>>> order(foo)
>    >>> Error in if (xi > xj) 1L else -1L : missing value where
>TRUE/FALSE needed
>    >> 
>    >>>> foo <-c('x','\265g')
>    >>>> order(foo)
>    >>> [1] 1 2
>    >> 
>    >> yes, this is not desirable.
>    >> order() in such cases calls xtfrm()  {as documented}
>    >> and that ends up calling rank() and then the internal  .gt()
>    >> where the bug happens because
>    >> 
>    >>> I("x") > I("\xb5g")
>    >> [1] NA
>    >> 
>    >> but really I think the change should happen in xtfrm.Asis(.)
>    >> which I think should drop the class also in this case.
>    >> 
>    >> More on this, once we have fixed it.
>    >> 
>    >> Thank you, Don, very much!
>    >> 
>    >> Martin Maechler,
>    >> ETH Zurich
>    >> 
>    >>>> sessionInfo()
>    >>> R version 3.1.1 (2014-07-10)
>    >>> Platform: x86_64-apple-darwin13.1.0 (64-bit)
>    >> 
>    >>> locale:
>    >>> [1] C
>    >> 
>    >>> attached base packages:
>    >>> [1] stats     graphics  grDevices utils     datasets  methods
>base
>    >> 
>    >>> Thanks
>    >>> -Don
>    >> 
>    >>> p.s.
>    >>> Just a little background, irrelevant unless one wonders why I¹m
>using I()
>    >>> and \265:
>    >> 
>    >>> If I were writing new code I wouldn¹t be using I(), since there
>are better
>    >>> ways now to achieve the same end (preventing the creation of
>factors in
>    >>> data frames), but the scripts that use it are quite old,
>originally
>    >>> developed in 2001.
>    >> 
>    >>> In at least some but perhaps limited contexts, Œ\265¹ produces
>the greek
>    >>> letter mu, and that¹s why I¹m using it. And if I remember
>correctly, 2001
>    >>> is prior to the current R support for locales and extended
>character sets.
>    >>> Using \265 is what I could find at that time to get a mu into my
>output.
>    >> 
>    >>> I came across this while checking some things; it¹s not actually
>breaking
>    >>> my scripts, so I doubt it¹s due to any recent change.
>    >> 
>    >> 
>    >>> -- 
>    >>> Don MacQueen
>    >> 
>    >>> Lawrence Livermore National Laboratory
>    >>> 7000 East Ave., L-627
>    >>> Livermore, CA 94550
>    >>> 925-423-1062
>    >> 
>    >>> ______________________________________________
>    >>> R-devel at r-project.org mailing list
>    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>    >> 
>    >> ______________________________________________
>    >> R-devel at r-project.org mailing list
>    >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>    > -- 
>    > Peter Dalgaard, Professor,
>    > Center for Statistics, Copenhagen Business School
>    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>    > Phone: (+45)38153501
>    > Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
>
>
>
>
>
>