[R-SIG-Mac] Text encoding and R
Chabot Denis
chabotd at globetrotter.net
Fri Mar 23 16:51:47 CET 2007
Hi,
For the last many versions of R (at least since v2, I think), I did
not need to worry about getting R to output french accented vowels on
my plots. I can place comments in french in my scripts without any
problem. In fact things work so well I even started, now and then,
using variable names with accents (so much prettier!).
Now I'm trying to introduce a student to R. Actually she'll need to
use some of my programs (scripts) to analyse her data.
But all the accented vowels come up wrong on her copy of R (the
default gui she got after installing R from CRAN)and worse, her R
does not like one bit my variable names containing accents.
I know that on my mac, R is not happy if I don't save my programs and
my data files in UTF8-no BOM (I'm not sure about the no BOM,
sometimes R accepts files in UTF8 only, sometimes not, but it always
accepts them if I set the no BOM encoding in TextWrangler).
I've never told R anything about what encoding to use, it chose UTF8
no BOM all by itself.
I double-checked with these commands:
> Sys.getlocale(category = "LC_ALL")
[1] "fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8"
> getOption("encoding")
[1] "native.enc"
> localeToCharset(locale = Sys.getlocale("LC_CTYPE"))
[1] "UTF-8" "ISO8859-1"
I thought ISO8859-1 was the same as ISO-Latin 1, and tried again
reading in a data file containing an accented vowel. R did not like it.
The student on a PC typed the same commands and got:
> Sys.getlocale(category = "LC_ALL")
>
[1]"LC_COLLATE=French_Canada.1252;LC_CTYPE=French_Canada.
1252;LC_MONETARY=French_Canada.
1252;LC_NUMERIC=C;LC_TIME=French_Canada.1252"
> getOption("encoding")
>
[1] "native.enc"
> localeToCharset(locale = Sys.getlocale("LC_CTYPE"))
>
[1] "ISO8859-1"
It seems that both our versions of R should handle ISO-Latin 1. Do
you have enough details to tell me why mine does not seem to like ISO-
Latin 1?
For the collaboration with my student to work (and for her not to
give up on R), I need to either make my R accept to give me the same
level of good services in French using ISO-Latin 1, or to tell my
students's version of R on a PC to accept scripts and data files in
UTF8.
Which of the two is easiest?
Thanks in advance,
Denis Chabot
More information about the R-SIG-Mac
mailing list