[R] Inconsistent alphabetisation issue

peter dalgaard pdalgd at gmail.com
Fri May 23 17:49:02 CEST 2014


Try Sys.getlocale() on the two machines and see if you're not getting different results. Collating sequence differs even between different flavors of English. E.g., some sort AaBbCb, others ABC......abc and the handling of spaces and punctuation charactes can differ too.

(Something's not quite right with your output: What happened to "Not Known" with capital K? That is probably not important, though.)

- Peter


On 23 May 2014, at 13:00 , Stefano Conti <s.conti at gmx.co.uk> wrote:

> 
>   Dear R users community,
>   For some time now I have occasionally observed some inconsistent behaviour
>   across identical (i.e. same 3.1.0 version and set-up / configuration) R
>   installations on separate Linux machines (all manufactured in the UK).
> 
>   Specifically,  after  reading  (via 'read.table' or its flavours) some
>   data-frames and then tabulate its factors, I notice that the levels of some
>   factors are by default alphabetised differently between different machines.
> 
>   As an example, on 2 separate work I obtain from a given data-frame (say
>   'tbl') before applying any processing the same output
> 
>> tbl <- read.csv(path.expand("~/tmp/tbl.csv"), header=TRUE)
>> levels(tbl$Ethnicity)
>    [1] "Black-African"               "Black-Caribbean"
>    [3] "Black other"                   "Indian/Pakistani/Bangladeshi"
>    [5] "Not Known"                    "Other Asian/Oriental"
>    [7] "Other/Mixed"                 "White"
>    [9] "Black Other"                  "Not known"
> 
>   whereas reproducing the same code and instructions on my personal laptop
>   yields the following:
> 
>> tbl <- read.csv(path.expand("~/tmp/tbl.csv"), header=TRUE)
>> levels(tbl$Ethnicity)
>    [1] "Black other"                      "Black-African"
>    [3] "Black-Caribbean"               "Indian/Pakistani/Bangladeshi"
>    [5]      "Not      known"                                      "Other
>   Asian/Oriental"
>    [7] "Other/Mixed"                     "White"
>    [9] "Black Other"                      "Not known"
> 
>   I've  tried  looking  up  on  the  R mailing list, as well as on the R
>   documentation  and on Stack Overflow, what could the source of, and in
>   particular a solution to, this discrepant behaviour; unfortunately, apart
>   from some hint to localisation issues -- which I can't see how they'd apply
>   in my case -- couldn't find anything pertinent.
> 
>   Many thanks in advance for any help / insight you may have to provide on
>   this!
>   --
>   Dr Stefano Conti
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list