[R-SIG-Mac] Reading in a table originally with ISO-latin1 encoding (in Linux)

Simon Urbanek simon.urbanek at r-project.org
Wed May 16 18:18:53 CEST 2007


Seppo,

On May 16, 2007, at 11:39 AM, Seppo Nyrkkö wrote:

> Dear mac & R users,
>
> Returning to this issue, I and Antti found out this certain problem  
> with R.app and Scandinavian characters was triggered by the Mac OS  
> X's system-wide language locale set to "C" (POSIX) in the OS X  
> installation phase.
>

Did you even have a look? If you did, you'd see that pretty much  
nothing of what you (or Antti) said is true. For example in the US  
locale you get:

 > Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

Please read the documentation, especially R for Mac OS X FAQ: "9  
Internationalization of the R.app". The R.app sets the locale  
according to user's preferences. For proper operation you *must* use  
UTF-8 locale, because that is what all system calls expect. You can  
run R in C locale if you insist, but then you lose the ability to  
display non-ASCII character.

This has nothing to do with reading and writing files - note the  
"encoding" option in all read/write functions that allows you to read/ 
write files in various encodings. Please read R documentation on  
encodings.

Cheers,
Simon




> (details follow)
>
>
> On June 22, 2006 at 19:43, Antti Arppe wrote:
>
>   Dear colleagues,
>>
>> With the help of a colleague of mine here in Helsinki (Seppo Nyrkkö)
>> who looked at the innards of the R source code for Mac it turned out
>> that this was in the end indeed an issue concerning the Mac locale  
>> and
>> its settings and not R.
>>
>> Though we had tried this earlier by changing the LANG variable to
>> 'fi_FI', we hadn't looked hard enough in the available encodings  
>> (with
>> locale -a) to select the exactly correct value, being:
>>
>> LANG=fi_FI.IS08859-1; export LANG;
>>
>> With this configuration R was able to happily read in my original
>> table with the Scandinavian characters in the header, without no  
>> fuss.
>>
>> Thanks for your advice, and wishing all a good Midsummer,
>>
>>         -Antti Arppe
>>
>
>
> At the startup, R checks whether it is running in an international
> character set locale or not. The locale information is inherited from
> the parent process, i.e. the os x window server, which reads locale
> settings from the system-wide settings. This information describes
> which characters are printable, and which should be displayed as
> substituted characters during the whole R session. The POSIX C locale
> allows only displaying 7-bit ASCII characters, and disables any
> printing of the scandinavian characters (ä,ö,å) in R.app.
>
> First step of recovery is to change the system from the C locale to an
> international locale which allows utf-8 character sequences (can be
> done through System Preferences). This enables proper output of  
> unicode
> characters in the R.app terminal.
>
> Then, to read and write files in the latin-1 (iso-8859-1) character  
> set
> (note that the system does utf-8 by default now), one should change  
> the
> default encoding for file operations by commanding
>    'options(encoding="iso-8859-1")'
> at the command prompt. It is also possible to add this setting in the
> startup file ".Rprofile" in the project startup directory.
>
> Changing the locale in the command-line shell session (either by hand
> or in the shell profile script) might not be the best solution here,
> since other locale-aware OS X applications, launched from the window
> manager, would remain in the C locale.
>
> with best regards,
>    Seppo
>
> _______________________________________________
> R-SIG-Mac mailing list
> R-SIG-Mac at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>
>



More information about the R-SIG-Mac mailing list