[R] Sorting strings

(Ted Harding) Ted.Harding at wlandres.net
Mon Feb 20 18:16:52 CET 2012


On 20-Feb-2012 Petr Savicky wrote:
> On Mon, Feb 20, 2012 at 05:55:30AM -0800, statquant2 wrote:
>> I did, but this does not give the answer to my question...
>> Anybody knows how to tweack the behaviour of sort or how to do ?
> 
> Hi.
> Try this
> 
>   Sys.setlocale("LC_COLLATE", "C") 
> 
> This comes from ?locale and reads there
> 
>   Sys.setlocale("LC_COLLATE", "C")   # turn off locale-specific sorting,
>                                      #  usually
> 
> See also ?sort
> 
>      The sort order for character vectors will depend on the
>      collating sequence of the locale in use: see 'Comparison'.
> 
> ?Comparison
> 
>      Comparison of strings in character vectors is lexicographic
>      within the strings using the collating sequence of the locale
>      in use: see 'locales'. The collating sequence of locales such
>      as 'en_US' is normally different from 'C' (which should use
>      ASCII) and can be surprising. Beware of making _any_ assumptions
>      about the collation order: ...
> 
> Hope this helps.
> Petr Savicky.

I've been following this thread with interest. I had begun composing
a reply on similar lines to Petr's above, but put it on one side
while waiting to see how the thread would evolve.

In view of the tangle of mixed experiences reported by different
users, I now wonder whether we should have something like "lc_collate"
as a specific parameter for sort(), e.g. so that one can set, for a
particular sorting operation,

   sort(c("X.","X0B"),lc_collate="C")

without affecting the system "LC_COLLATE" setting (i.e. the change
takes effect only within the execution of that sort() command).

Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 20-Feb-2012  Time: 17:16:47
This message was sent by XFMail



More information about the R-help mailing list