[BioC] WARNING: difference in sorting order depending on computer platform?!?
Seth Falcon
sfalcon at fhcrc.org
Thu Jan 28 22:22:35 CET 2010
Hi Jenny,
On 1/28/10 12:16 PM, Jenny Drnevich wrote:
> I just found a problem/discrepancy in running R on PC vs. Unix/Linux
> server. Maybe it's widely known, but I didn't know about it and it
> caused me big problems.
Ouch, that's not a fun problem to run into. The issue here is not so
much platform as what's called locale. Locale settings determine such
things as how numbers should be displayed ("," vs "."), time format, and
indeed sorting of strings.
You can read up on locale on Wikipedia:
http://en.wikipedia.org/wiki/Locale
Different locale settings impose different orderings of strings. Once
you know this, the good news is that you can control the locale setting
that R uses and should be able to obtain stable sorting across platforms.
Here's an example run on a Windows system:
>> strsplit(Sys.getlocale(), ";")
> [[1]]
> [1] "LC_COLLATE=English_United States.1252"
> [2] "LC_CTYPE=English_United States.1252"
> [3] "LC_MONETARY=English_United States.1252"
> [4] "LC_NUMERIC=C"
> [5] "LC_TIME=English_United States.1252"
>
>> v = c("177_at", "1773_at")
>> sort(v)
> [1] "177_at" "1773_at"
>> Sys.setlocale(locale="C")
> [1] "C"
>> sort(v)
> [1] "1773_at" "177_at"
Note that not all locales are available on all systems, but the "C"
locale is the basic common denominator -- but only supports ASCII not
extended character sets.
In summary, I think you can continue to use your two different systems
if you do Sys.setlocale(locale="C") at the start of your script.
+ seth
--
Seth Falcon
Bioconductor Core Team | FHCRC
More information about the Bioconductor
mailing list