[Rd] Expected behaviour of is.unsorted?

Duncan Murdoch murdoch.duncan at gmail.com
Thu May 24 17:25:33 CEST 2012


On 24/05/2012 11:10 AM, Matthew Dowle wrote:
> >  On 24/05/2012 9:15 AM, Matthew Dowle wrote:
> >>  Duncan Murdoch<murdoch.duncan<at>   gmail.com>   writes:
> >>  >
> >>  >   On 12-05-24 7:39 AM, Matthew Dowle wrote:
> >>  >   >   Duncan Murdoch<murdoch.duncan<at>    gmail.com>    writes:
> >>  >   >>
> >>  >   >>   On 12-05-23 4:37 AM, Matthew Dowle wrote:
> >>  >   >   Since it seems to have a bug anyway (and if so, can't be correct
> >>  in anyone's
> >>  >   >   use of it), could either is.unsorted on a data.frame return the
> >>  error
> >>  that's in
> >>  >   >   the C code already: "only atomic vectors can be tested to be
> >>  sorted", for
> >>  >   >   safety and to lessen confusion, or be changed to return the
> >>  natural
> >>  expectation
> >>  >   >   proposed above? The easiest quick fix would be to negate the
> >>  result of
> >>  the .gtn
> >>  >   >   call of course, but then you could never go back.
> >>  >
> >>  >   I don't follow the last sentence.  If the .gtn call needs to be
> >>  negated,
> >>  >   why would you want to go back?
> >>
> >>  Because then is.unsorted(DF) would work, but go by row, which you
> >>  guessed above
> >>  wasn't intended and isn't sensible. But once it worked in that way,
> >>  users might
> >>  start to depend on it; e.g., by writing is.unsorted(t(DF)). If I came
> >>  along in future and suggested that was inefficient and wouldn't it be
> >>  more
> >>  natural and efficient if is.unsorted(DF) went by column, returning the
> >>  same as
> >>  with(DF,is.unsorted(order(a,b))) but implemented efficiently, you would
> >>  fear
> >>  that user code now depended on it going by row and say it was too late.
> >>  I'd
> >>  persist and highlight that it didn't seem in keeping with the spirit of
> >>  is.unsorted()'s speed since it short circuits on the first unsorted
> >>  item, which
> >>  is why we love it. You'd reply that's not documented. Which it isn't.
> >>  And that
> >>  would be the end of that.
> >
> >  Okay, I'm going to fix the handling of .gtn results, and document the
> >  unsuitability of this
> >  function for dataframes and arrays.
>
> But that leaves the door open to confusion later, whilst closing the door
> to a better solution: making is.unsorted() work by column for data.frame;
> i.e., making is.unsorted _suitable_ for data.frame. If you just do the
> quick fix for .gtn result you can never go back. If making is.unsorted(DF)
> work by column is too hard for now, then leaving the door open would be
> better by returning the error message already in the C code: "only atomic
> vectors can be tested to be sorted". That would be a better quick fix
> since it leaves options for the future.

I don't see why saying this function is unsuitable for dataframes 
implies that it will never be made suitable for dataframes.

The fix handles the case is.unsorted was designed for:  it checks 
whether x[1] < x[2] < x[3] etc., which it doesn't currently do properly 
for non-atomic objects.

Duncan Murdoch
>
> >  Duncan Murdoch
> >
> >>
> >>  >   Duncan Murdoch
> >>  >
> >>  >   >
> >>  >   >   Matthew
> >>  >   >
> >>  >   >>   Duncan Murdoch
> >>  >   >>
> >>  >   >>>
> >>  >   >>>   I understand why the first two are FALSE (1 item of anything
> >>  must be
> >>  >   >>>   sorted). I don't understand the 3rd and 4th cases where length
> >>  is 2:
> >>  >   >>>   do_isunsorted seems to call lang3(install(".gtn"), x,
> >>  CADR(args))). Does
> >>  >   >>>   that fall back to TRUE for some reason?
> >>  >   >>>
> >>  >   >>>   Matthew
> >>  >   >>>
> >>  >   >>>>   sessionInfo()
> >>  >   >>>   R version 2.15.0 (2012-03-30)
> >>  >   >>>   Platform: x86_64-pc-mingw32/x64 (64-bit)
> >>  >   >>>
> >>  >   >>>   locale:
> >>  >   >>>   [1] LC_COLLATE=English_United Kingdom.1252
> >>  LC_CTYPE=English_United
> >>  >   >>>   Kingdom.1252
> >>  >   >>>   [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
> >>  >   >>>   [5] LC_TIME=English_United Kingdom.1252
> >>  >   >>>
> >>  >   >>>   attached base packages:
> >>  >   >>>   [1] stats     graphics  grDevices utils     datasets  methods
> >>  base
> >>  >   >>>
> >>  >   >>>   other attached packages:
> >>  >   >>>   [1] data.table_1.8.0
> >>  >   >>>
> >>  >   >>>   loaded via a namespace (and not attached):
> >>  >   >>>   [1] tools_2.15.0
> >>  >   >>>
> >>  >   >>>   ______________________________________________
> >>  >   >>>   R-devel<at>    r-project.org mailing list
> >>  >   >>>   https://stat.ethz.ch/mailman/listinfo/r-devel
> >>  >   >>
> >>  >   >>
> >>  >   >
> >>  >   >   ______________________________________________
> >>  >   >   R-devel<at>   r-project.org mailing list
> >>  >   >   https://stat.ethz.ch/mailman/listinfo/r-devel
> >>  >
> >>  >
> >>
> >>  ______________________________________________
> >>  R-devel at r-project.org mailing list
> >>  https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
>



More information about the R-devel mailing list