[R] Sweave'ing Danish characters

Peter Jepsen PJ at DCE.AU.DK
Tue Jan 27 13:06:44 CET 2009


Thank you, Duncan! It works perfectly!

Best regards,
Peter.

-----Original Message-----
From: Duncan Murdoch [mailto:murdoch at stats.uwo.ca] 
Sent: 27. januar 2009 13:04
To: Peter Jepsen
Cc: r-help at r-project.org
Subject: Re: [R] Sweave'ing Danish characters

On 26/01/2009 5:44 PM, Peter Jepsen wrote:
> Hi,
> 
> I am writing an Sweave document and am using 'xtable' to make frequency tables of diagnoses of people undergoing cholecystectomy. Some of these diagnoses contain Danish characters ("æ", "ø", and "å"), and these characters are all garbled in the Latex document after I run Sweave. The odd thing is, everything looks absolutely right in the R console, and if I enter the same Danish characters in a new variable, the new variable produces no problems?! Therefore, I cannot offer a reproducible example, but I am hoping nonetheless that someone can point me towards a solution.

This looks like an encoding problem:  there are several different 
standards for encoding non-ASCII characters.  All of your tools have to 
agree on the encoding.

To my eye it looks as though in the first case R is writing out UTF-8, 
and whatever you are using to look at your .tex file is assuming latin1 
(some Windows programs say "ANSI", but I think that doesn't fully 
specify the encoding:  you also need a code page, which is set somewhere 
in Windows control panel.)

The functions related to encodings in R are:

  options(encoding="latin1")  - set the default encoding

  iconv(x, from="latin1", to="UTF-8")  - re-encode entries, mapping each 
character from one encoding to the other

  Encoding(x) - display the encoding of each entry (unknown means ascii 
or the native encoding for your platform)

  Encoding(x) <- "latin1" - change the declared encoding, without 
changing the bytes.

Duncan Murdoch

> To illustrate:
> 
>> library(xtable)
>> library(Hmisc)
>> rm(list=ls())
>> load("u:/kirurgi/cholecystit/Chol_oprenset.Rdata")
>> 	
>> test2 <- chol$nydiag[3]	# This 3rd observation contains a diagnosis with Danish characters ("Kræft i fordøjelsessystemet", meaning gastrointestinal cancer).
>>
>> print(xtable(table(test2)))
> % latex table generated in R 2.8.1 by xtable 1.5-4 package
> % Mon Jan 26 23:31:37 2009
> \begin{table}[ht]
> \begin{center}
> \begin{tabular}{rr}
>   \hline
>  & test2 \\
>   \hline
> Kræft i fordøjelsessystemet &   1 \\	# It looks right here, but in the .tex-file it says "Kræft i fordøjelsessystemet"
>    \hline
> \end{tabular}
> \end{center}
> \end{table}
> 
>> print(xtable(table("Kræft i fordøjelsessystemet")))	# This, on the other hand, works like a charm.
> % latex table generated in R 2.8.1 by xtable 1.5-4 package
> % Mon Jan 26 23:36:53 2009
> \begin{table}[ht]
> \begin{center}
> \begin{tabular}{rr}
>   \hline
>  & V1 \\
>   \hline
> Kræft i fordøjelsessystemet &   1 \\	# See, no problems here!
>    \hline
> \end{tabular}
> \end{center}
> \end{table}
> 
> 
> I am using Windows Vista 64-bit and MikTex 2.7. 
> 
> Best regards,
> Peter.
> 
>> sessionInfo()
> R version 2.8.1 (2008-12-22) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=Danish_Denmark.1252;LC_CTYPE=Danish_Denmark.1252;LC_MONETARY=Danish_Denmark.1252;LC_NUMERIC=C;LC_TIME=Danish_Denmark.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] Hmisc_3.4-4    foreign_0.8-30 xtable_1.5-4  
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.12 grid_2.8.1      lattice_0.17-20 tools_2.8.1
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list