[R] questions on French characters in plot

Richard Zijdeman richard.zijdeman at me.com
Tue Dec 11 16:41:24 CET 2012


Dear Milan,

thank you for kind suggestion. Converting the characters using:
> iconv(department, "ISO-8859-15", "UTF-8")
indeed improves the situation in that now all values (names of departments) are displayed in the plot, although the specific special characters are unfortunately appearing as empty boxes.

I have tried to see whether I could 'save' my state file using UTF-8 format, and although this proves to be a popular request it does not seem to have been incorporated in Stata.

Best and thank you for your help,

Richard


On 11 Dec 2012, at 12:11, Milan Bouchet-Valat <nalimilan at club.fr> wrote:

> Le mardi 11 décembre 2012 à 01:10 +0100, Richard Zijdeman a écrit :
>> Dear all,
>> 
>> I have imported a dataset from Stata using the foreign package. The
>> original data contain French characters such as  and  .
>> After importing, string variables containing names of French
>> departments have changed. E.g. Ardche became Ard\x8fche. I would like
>> to ask how I could plot these changed strings, since now the strings
>> with special characters fail to be printed in the plot (either using
>> plot() or ggplot2()).
>> 
>> I have googled for solutions, but actually find it hard to determine
>> whether I should change my R setup or should read in the data in a
>> different way. Since I work on a mac I changed my local according to
>> the R for Mac OS X FAQ, chapter 9.  Below is some info on my setup and
>> code and output on what works for me and what does not. Thank you in
>> advance for you comments.
> Accentuated characters should work fine on a machine using a UTF-8
> locale as yours. I think the problem is that the imported data uses
> ISO8859-15 or UTF-16, not UTF-8.
> 
> I have no idea whether .dta files specify an encoding or not, but I
> think you can convert them in R by calling
> iconv(department, "ISO-8859-15", "UTF-8")
> or
> iconv(department, "UTF-16", "UTF-8")
> 
>> Best,
>> 
>> Richard
>> 
>> #--------------
>> rm(list=ls())
>> sessionInfo()
>> # R version 2.15.2 (2012-10-26)
>> # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>> #
>> # locale:
>> # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> 
>> # creating variables
>> department  <- c("Nord","Paris","Ard\x8fche")
> \x8 does not correspond to "è" AFAIK. In ISO8859-1 and -15 and UTF-16,
> it's \xE8 ("\uE8" in R).
> 
> In UTF-8, it's C3 A8, "\303\250" in R.
> 
>> department2 <- c("Nord", "Paris", "Ardche")
>> n           <- c(2,4,1)
>> 
>> # creating dataframes
>> df  <- data.frame(department,n)
>> df2 <- data.frame(department2,n)
>> 
>> department
>> # [1] "Nord"       "Paris"      "Ard\x8fche"
>> department2
>> # [1] "Nord"    "Paris"   "Ardche"
>> 
>> plot(df) # fails to show the text "Ardche"
>> plot(df2) # shows text "Ardche"
>> 
>> # EOF
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list