[R] Coding systems.
gerald.jean at dgag.ca
gerald.jean at dgag.ca
Wed Nov 27 17:46:44 CET 2013
Hello,
as Jan pointed out the problem is with the encoding in which R saves the
fucntion. If I set this encoding to "UTF-8" in source everything is fine.
If I go either in my .bash_profile or my .Renviron file and set all LOCALE
variables to "fr_CA.UTF8" it should do the job, and to a certain point it
does, I can source, and save in my personnal library functions with
multibyte characters and they will run as expected.
BUT with these settings
at startup R throws the following error:
Erreur : caractères multioctets incorrects dans l'analyse de code (parser)
à la ligne 28
which translates in something like:
Error: incorrect multi-byte characters in the code analysis (parser) at
line 28
Further more I can't install any package, install.packages returns the same
error and stops execution???
I know the work around is to not specify an UTF-8 locale in my profiles and
explicitly pass the argument "encoding = 'UTF-8'" to source. But to me,
this is somewhat of an inconsistency!!!
Thanks to Jan for his insights,
Gérald
(Embedded image moved to file:
pic09232.gif)
Gerald Jean, M. Sc. en statistiques
Conseiller senior en statistiques Lévis (siège social)
Actuariat corporatif, 418 835-4900, poste
Modélisation et Recherche 7639
Assurance de dommages 1 877 835-4900, poste
Mouvement Desjardins 7639
Télécopieur : 418
835-6657
Faites bonne impression et imprimez seulement au besoin!
Ce courriel est confidentiel, peut être protégé par le secret professionnel et
est adressé exclusivement au destinataire. Il est strictement interdit à toute
autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez
reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur.
Merci.
Jan van der Laan
<rhelp at eoos.dds.n
l> A
r-help at r-project.org
2013/11/27 02:26 cc
gerald.jean at dgag.ca
Objet
Re: [R] Coding systems.
Could it be that your r-script is saved in a different encoding than
the one used by R (which will probably be UTF8 since you're working on
linux)?
--
Jan
gerald.jean at dgag.ca schreef:
> Hello,
>
> I am using R, 2.15.2, on a 64-bit Linux box. I run R through Emacs' ESS.
>
> R runs in a French, Canadian-French, locale and lately I got surprising
> results
> from functions making factor variables from character variables. Many of
> the
> variables in input data.frames are character variables and contain latin
> accents, for exemple the "é" in "Montréal". I waisted several days
playing
> with coding systems and trying to understand why some code when run one
> command at
> a time from the command line gives the expected result while when cut and
> pasted in a function it doesn't???
>
> For example the following code:
>
>
==============================================================================
> ttt.rmr <- sima.31122012$rmrnom
> ttt.rmr.2 <- ifelse (ttt.rmr %in% c("Edmonton", "Edmundston",
> "Charlottetown", "Calgary",
"Winnipeg",
> "Victoria", "Vancouver", "Toronto",
> "St. John's", "Saskatoon", "Regina",
> "Québec", "Ottawa - Gatineau
(Ontario",
> "Ottawa - Gatineau (partie",
> "Montréal",
> "Halifax", "Fredericton"),
> "Grandes villes", ifelse(ttt.rmr == "", "Manquant",
> "Autres"))
> unique(ttt.rmr.2)
> ttt.rmr.2 <- factor(ttt.rmr.2, levels = c("Grandes villes", "Autres",
> "Manquant"),
> labels = c("Grandes villes", "Autres", "Manquant"))
>
>
==============================================================================
>
> will have "Montréal" and "Québec" in the "Grandes villes" level of the
> factor
> variable, while running the same code in a function will have them in
> "Autres".
> The variable "rmr.Merged" in the data.frame
"test2.sima.31122012.DataPrep"
> is
> the output of the function, which, of course, does a lot of other stuff.
>
>
==============================================================================
> ttt.w <- which(ttt.rmr.2 != test2.sima.31122012.DataPrep$rmr.Merged)
> frequence(test2.sima.31122012.DataPrep$rmrnom[ttt.w])
> Frequency Percent Cum.Freq Cum.Percent
> Montréal 1301254 79.57173 1301254 79.57173
> Québec 334068 20.42827 1635322 100.00000
>
==============================================================================
>
> All other city names, no accents, were correctly classified but
"Montréal"
> and
> "Québec", together they represent over 1.5M records, not negligeable!!!
>
> Following is my ".Renviron" file where I set up environment variables for
> R.
>
> R_PROFILE_USER="/home/jeg002/MyRwork/StartUp/profile.R"
> # export R_PROFILE_USER
> R_HISTFILE="/home/jeg002/MyRwork/.Rhistory"
> ## Default editor
> EDITOR=${EDITOR-${VISUAL-'/usr/local/bin/emacsclient'}}
> ## Default pager
> PAGER=${PAGER-'/usr/local/bin/emacsclient'}
>
> ## Setting locale, hoping it will be OK "all" the time!!!
> LANG=fr_CA
> LANGUAGE=fr_CA
> LC_ADDRESS=fr_CA
> LC_COLLATE=fr_CA
> LC_TYPE=fr_CA
> LC_IDENTIFICATION=fr_CA
> LC_MEASUREMENT=fr_CA
> LC_MESSAGES=fr_CA
> LC_NAME=fr_CA
> LC_PAPER=en_US
> LC_NUMERIC=en_US
> LC_TELEPHONE=fr_CA
> LC_MONETARY=fr_CA
> LC_TIME=fr_CA
> R_PAPERSIZE='letter'
>
==============================================================================
>
> and:
>
>> Sys.getlocale()
> [1]
>
"LC_CTYPE=fr_CA;LC_NUMERIC=C;LC_TIME=fr_CA;LC_COLLATE=fr_CA;LC_MONETARY=fr_CA;LC_MESSAGES=fr_CA;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_CA;LC_IDENTIFICATION=C"
>
>> Sys.getenv(c("LANGUAGE", "LANG"))
> LANGUAGE LANG
> "fr_CA" "fr_CA"
>
> I must be missing something!!! Maybe someone can make sense of this!!!
> Thanks
> for your support,
>
> Gérald Jean
>
> (Embedded image moved to file:
> pic06023.gif)
>
> Gerald Jean, M. Sc. en statistiques
> Conseiller senior en statistiques Lévis (siège social)
>
> Actuariat corporatif, 418 835-4900, poste
> Modélisation et Recherche 7639
> Assurance de dommages 1 877 835-4900, poste
> Mouvement Desjardins 7639
> Télécopieur : 418
> 835-6657
>
>
>
>
> Faites bonne impression et imprimez seulement au besoin!
>
> Ce courriel est confidentiel, peut être protégé par le secret
> professionnel et
> est adressé exclusivement au destinataire. Il est strictement
> interdit à toute
> autre personne de diffuser, distribuer ou reproduire ce message. Si
> vous l'avez
> reçu par erreur, veuillez immédiatement le détruire et aviser
l'expéditeur.
> Merci.
More information about the R-help
mailing list