[Rd] Problem with order() and I()
MacQueen, Don
macqueen1 at llnl.gov
Wed Sep 10 17:19:21 CEST 2014
Early on I had been wondering if deprecating I() and the AsIs class would
be a way to get the problem to go away. I imagine (based on no data at
all!) that they are rarely used. If I were writing the same code today, I
would use options(stringsAsFactors=FALSE) instead of sprinkling I() here
and there throughout my scripts.
But I see from the discussions that there’s something deeper going on.
Thanks for continuing to cc me; I find it interesting.
-Don
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 9/9/14, 9:35 AM, "Martin Maechler" <maechler at stat.math.ethz.ch> wrote:
>>>>>> peter dalgaard <pdalgd at gmail.com>
>>>>>> on Tue, 9 Sep 2014 16:36:19 +0200 writes:
>
> > It's actually a little more complicated. I wrote a note, but it
>seems to be stuck in the outbox on my home machine (I probably forgot to
>click Send...).
> > One important aspect is that
>
> >> "x" < "\265g"
> > [1] NA
>
> > which makes me wonder if the bug really is in the case that
>"works". It seems that it is possible to rank() character vectors that
>contain incomparable elements.
>
> > -pd
>
>yes you are right that it is even more complicated.
>In both cases, our Scollate() is involved,
>(Scollate: the one where we had a discussion about making it part of the C
> level R API, which would help package authors ..)
>
>After
>
> ch <- c('x','\265g')
> foo <- I(ch)
>
>Of the four expressions,
>
> order(ch)
> order(foo)
> ch [1] < ch [2]
> foo[1] < foo[2]
>
>only the first one "works", the others give NA or an error because of NA
>and the first one is the only of the 4 that does not use
>do_relop_dflt()
>
>It's not even clear what we'd want (as I think pd also alluded to):
>Ideally all of these should work consistently, which because of
> "<(.,.)" returning NA in both cases,
>would mean that order(ch) also should give an error as order(foo)
> {{ an error we should improve the message in any case!!}.
>Big Q: Can we afford order(ch) giving an error in such cases.
>Pretty high chance that this will "break" much user (and probably
>even package) code out there.
>
>Still, the other solution, namely order(foo) behaving as
>order(ch) now does would correspond to the ">" giving FALSE
>instead of NA, so this solution is not ok in my view.
>
>Martin
>
>
> > On 09 Sep 2014, at 16:19 , Martin Maechler
><maechler at stat.math.ethz.ch> wrote:
>
> >>>>>>> MacQueen, Don <macqueen1 at llnl.gov>
> >>>>>>> on Mon, 8 Sep 2014 16:06:21 +0000 writes:
> >>
> >>> I have found that order() fails in a rather arcane circumstance,
>as in
> >>> this example:
> >>
> >>>> foo <- I( c('x','\265g') )
> >>>> order(foo)
> >>> Error in if (xi > xj) 1L else -1L : missing value where
>TRUE/FALSE needed
> >>
> >>>> foo <-c('x','\265g')
> >>>> order(foo)
> >>> [1] 1 2
> >>
> >> yes, this is not desirable.
> >> order() in such cases calls xtfrm() {as documented}
> >> and that ends up calling rank() and then the internal .gt()
> >> where the bug happens because
> >>
> >>> I("x") > I("\xb5g")
> >> [1] NA
> >>
> >> but really I think the change should happen in xtfrm.Asis(.)
> >> which I think should drop the class also in this case.
> >>
> >> More on this, once we have fixed it.
> >>
> >> Thank you, Don, very much!
> >>
> >> Martin Maechler,
> >> ETH Zurich
> >>
> >>>> sessionInfo()
> >>> R version 3.1.1 (2014-07-10)
> >>> Platform: x86_64-apple-darwin13.1.0 (64-bit)
> >>
> >>> locale:
> >>> [1] C
> >>
> >>> attached base packages:
> >>> [1] stats graphics grDevices utils datasets methods
>base
> >>
> >>> Thanks
> >>> -Don
> >>
> >>> p.s.
> >>> Just a little background, irrelevant unless one wonders why I¹m
>using I()
> >>> and \265:
> >>
> >>> If I were writing new code I wouldn¹t be using I(), since there
>are better
> >>> ways now to achieve the same end (preventing the creation of
>factors in
> >>> data frames), but the scripts that use it are quite old,
>originally
> >>> developed in 2001.
> >>
> >>> In at least some but perhaps limited contexts, Œ\265¹ produces
>the greek
> >>> letter mu, and that¹s why I¹m using it. And if I remember
>correctly, 2001
> >>> is prior to the current R support for locales and extended
>character sets.
> >>> Using \265 is what I could find at that time to get a mu into my
>output.
> >>
> >>> I came across this while checking some things; it¹s not actually
>breaking
> >>> my scripts, so I doubt it¹s due to any recent change.
> >>
> >>
> >>> --
> >>> Don MacQueen
> >>
> >>> Lawrence Livermore National Laboratory
> >>> 7000 East Ave., L-627
> >>> Livermore, CA 94550
> >>> 925-423-1062
> >>
> >>> ______________________________________________
> >>> R-devel at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
More information about the R-devel
mailing list