[R-SIG-Mac] Text encoding and R

Chabot Denis chabotd at globetrotter.net
Fri Mar 23 16:51:47 CET 2007


Hi,

For the last many versions of R (at least since v2, I think), I did  
not need to worry about getting R to output french accented vowels on  
my plots. I can place comments in french in my scripts without any  
problem. In fact things work so well I even started, now and then,  
using variable names with accents (so much prettier!).

Now I'm trying to introduce a student to R. Actually she'll need to  
use some of my programs (scripts) to analyse her data.

But all the accented vowels come up wrong on her copy of R (the  
default gui she got after installing R from CRAN)and worse, her R  
does not like one bit my variable names containing accents.

I know that on my mac, R is not happy if I don't save my programs and  
my data files in UTF8-no BOM (I'm not sure about the no BOM,  
sometimes R accepts files in UTF8 only, sometimes not, but it always  
accepts them if I set the no BOM encoding in TextWrangler).

I've never told R anything about what encoding to use, it chose UTF8  
no BOM all by itself.

I double-checked with these commands:
 > Sys.getlocale(category = "LC_ALL")
[1] "fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8"
 > getOption("encoding")
[1] "native.enc"
 > localeToCharset(locale = Sys.getlocale("LC_CTYPE"))
[1] "UTF-8"     "ISO8859-1"


I thought ISO8859-1 was the same as ISO-Latin 1, and tried again  
reading in a data file containing an accented vowel. R did not like it.

The student on a PC typed the same commands and got:

> Sys.getlocale(category = "LC_ALL")
>
[1]"LC_COLLATE=French_Canada.1252;LC_CTYPE=French_Canada. 
1252;LC_MONETARY=French_Canada. 
1252;LC_NUMERIC=C;LC_TIME=French_Canada.1252"


> getOption("encoding")
>
[1] "native.enc"


> localeToCharset(locale = Sys.getlocale("LC_CTYPE"))
>
[1] "ISO8859-1"


It seems that both our versions of R should handle ISO-Latin 1. Do  
you have enough details to tell me why mine does not seem to like ISO- 
Latin 1?

For the collaboration with my student to work (and for her not to  
give up on R), I need to either make my R accept to give me the same  
level of good services in French using ISO-Latin 1, or to tell my  
students's version of R on a PC to accept scripts and data files in  
UTF8.

Which of the two is easiest?

Thanks in advance,

Denis Chabot



More information about the R-SIG-Mac mailing list