[R-SIG-Mac] Reading in a table originally with ISO-latin1 encoding (in Linux)
Simon Urbanek
simon.urbanek at r-project.org
Wed May 16 18:18:53 CEST 2007
Seppo,
On May 16, 2007, at 11:39 AM, Seppo Nyrkkö wrote:
> Dear mac & R users,
>
> Returning to this issue, I and Antti found out this certain problem
> with R.app and Scandinavian characters was triggered by the Mac OS
> X's system-wide language locale set to "C" (POSIX) in the OS X
> installation phase.
>
Did you even have a look? If you did, you'd see that pretty much
nothing of what you (or Antti) said is true. For example in the US
locale you get:
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
Please read the documentation, especially R for Mac OS X FAQ: "9
Internationalization of the R.app". The R.app sets the locale
according to user's preferences. For proper operation you *must* use
UTF-8 locale, because that is what all system calls expect. You can
run R in C locale if you insist, but then you lose the ability to
display non-ASCII character.
This has nothing to do with reading and writing files - note the
"encoding" option in all read/write functions that allows you to read/
write files in various encodings. Please read R documentation on
encodings.
Cheers,
Simon
> (details follow)
>
>
> On June 22, 2006 at 19:43, Antti Arppe wrote:
>
> Dear colleagues,
>>
>> With the help of a colleague of mine here in Helsinki (Seppo Nyrkkö)
>> who looked at the innards of the R source code for Mac it turned out
>> that this was in the end indeed an issue concerning the Mac locale
>> and
>> its settings and not R.
>>
>> Though we had tried this earlier by changing the LANG variable to
>> 'fi_FI', we hadn't looked hard enough in the available encodings
>> (with
>> locale -a) to select the exactly correct value, being:
>>
>> LANG=fi_FI.IS08859-1; export LANG;
>>
>> With this configuration R was able to happily read in my original
>> table with the Scandinavian characters in the header, without no
>> fuss.
>>
>> Thanks for your advice, and wishing all a good Midsummer,
>>
>> -Antti Arppe
>>
>
>
> At the startup, R checks whether it is running in an international
> character set locale or not. The locale information is inherited from
> the parent process, i.e. the os x window server, which reads locale
> settings from the system-wide settings. This information describes
> which characters are printable, and which should be displayed as
> substituted characters during the whole R session. The POSIX C locale
> allows only displaying 7-bit ASCII characters, and disables any
> printing of the scandinavian characters (ä,ö,å) in R.app.
>
> First step of recovery is to change the system from the C locale to an
> international locale which allows utf-8 character sequences (can be
> done through System Preferences). This enables proper output of
> unicode
> characters in the R.app terminal.
>
> Then, to read and write files in the latin-1 (iso-8859-1) character
> set
> (note that the system does utf-8 by default now), one should change
> the
> default encoding for file operations by commanding
> 'options(encoding="iso-8859-1")'
> at the command prompt. It is also possible to add this setting in the
> startup file ".Rprofile" in the project startup directory.
>
> Changing the locale in the command-line shell session (either by hand
> or in the shell profile script) might not be the best solution here,
> since other locale-aware OS X applications, launched from the window
> manager, would remain in the C locale.
>
> with best regards,
> Seppo
>
> _______________________________________________
> R-SIG-Mac mailing list
> R-SIG-Mac at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>
>
More information about the R-SIG-Mac
mailing list