[R-SIG-Mac] Reading in a table originally with ISO-latin1 encoding (in Linux)

Seppo Nyrkkö seppo.nyrkko at helsinki.fi
Wed May 16 17:39:10 CEST 2007


Dear mac & R users,

Returning to this issue, I and Antti found out this certain problem 
with R.app and Scandinavian characters was triggered by the Mac OS X's 
system-wide language locale set to "C" (POSIX) in the OS X installation 
phase.

(details follow)


On June 22, 2006 at 19:43, Antti Arppe wrote:

  Dear colleagues,
>
> With the help of a colleague of mine here in Helsinki (Seppo Nyrkkö) 
> who looked at the innards of the R source code for Mac it turned out 
> that this was in the end indeed an issue concerning the Mac locale and 
> its settings and not R.
>
> Though we had tried this earlier by changing the LANG variable to 
> 'fi_FI', we hadn't looked hard enough in the available encodings (with 
> locale -a) to select the exactly correct value, being:
>
> LANG=fi_FI.IS08859-1; export LANG;
>
> With this configuration R was able to happily read in my original 
> table with the Scandinavian characters in the header, without no fuss.
>
> Thanks for your advice, and wishing all a good Midsummer,
>
>         -Antti Arppe
>


At the startup, R checks whether it is running in an international 
character set locale or not. The locale information is inherited from 
the parent process, i.e. the os x window server, which reads locale 
settings from the system-wide settings. This information describes 
which characters are printable, and which should be displayed as 
substituted characters during the whole R session. The POSIX C locale 
allows only displaying 7-bit ASCII characters, and disables any 
printing of the scandinavian characters (ä,ö,å) in R.app.

First step of recovery is to change the system from the C locale to an 
international locale which allows utf-8 character sequences (can be 
done through System Preferences). This enables proper output of unicode 
characters in the R.app terminal.

Then, to read and write files in the latin-1 (iso-8859-1) character set 
(note that the system does utf-8 by default now), one should change the 
default encoding for file operations by commanding
   'options(encoding="iso-8859-1")'
at the command prompt. It is also possible to add this setting in the 
startup file ".Rprofile" in the project startup directory.

Changing the locale in the command-line shell session (either by hand 
or in the shell profile script) might not be the best solution here, 
since other locale-aware OS X applications, launched from the window 
manager, would remain in the C locale.

with best regards,
   Seppo



More information about the R-SIG-Mac mailing list