[BioC] WARNING: difference in sorting order depending on computer platform?!?

Seth Falcon sfalcon at fhcrc.org
Thu Jan 28 22:22:35 CET 2010


Hi Jenny,

On 1/28/10 12:16 PM, Jenny Drnevich wrote:
> I just found a problem/discrepancy in running R on PC vs. Unix/Linux
> server. Maybe it's widely known, but I didn't know about it and it
> caused me big problems.

Ouch, that's not a fun problem to run into.  The issue here is not so 
much platform as what's called locale.  Locale settings determine such 
things as how numbers should be displayed ("," vs "."), time format, and 
indeed sorting of strings.

You can read up on locale on Wikipedia:
http://en.wikipedia.org/wiki/Locale

Different locale settings impose different orderings of strings.  Once 
you know this, the good news is that you can control the locale setting 
that R uses and should be able to obtain stable sorting across platforms.

Here's an example run on a Windows system:

>> strsplit(Sys.getlocale(), ";")
> [[1]]
> [1] "LC_COLLATE=English_United States.1252"
> [2] "LC_CTYPE=English_United States.1252"
> [3] "LC_MONETARY=English_United States.1252"
> [4] "LC_NUMERIC=C"
> [5] "LC_TIME=English_United States.1252"
>
>> v = c("177_at", "1773_at")
>> sort(v)
> [1] "177_at"  "1773_at"
>> Sys.setlocale(locale="C")
> [1] "C"
>> sort(v)
> [1] "1773_at" "177_at"

Note that not all locales are available on all systems, but the "C" 
locale is the basic common denominator -- but only supports ASCII not 
extended character sets.

In summary, I think you can continue to use your two different systems 
if you do Sys.setlocale(locale="C") at the start of your script.

+ seth

-- 
Seth Falcon
Bioconductor Core Team | FHCRC



More information about the Bioconductor mailing list