[Rd] UTF8 markdown vignette

Duncan Murdoch murdoch.duncan at gmail.com
Tue Dec 9 14:05:54 CET 2014


On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:
> Dear Duncan,
> 
> The UTF-8 characters aren't properly rendered in the pdf version of the vignette.
> $£€ âêîûô äëïöüÿ áéíóúý àèìòù ãñ çµ is rendered as $£€ âêîûô äëïöüÿ áéà óúý à èìòù ãñçµ

That looks as though the UTF-8 characters are being interpreted as
Latin1 characters (or whatever your native encoding is on Windows) when
read from the file.

It is quite tricky to work with UTF-8 in R in Windows.  I think Sweave
does it properly, though there may be exceptions.  The issue is that
many character input routines assume characters start out in the native
encoding.  (There's also a translation that happens by default on
output, but I don't think that's your problem.)  So the way to debug
this is to follow all of the I/O, and see where the misinterpretation
happens.  For vignettes, things are complicated, because R reads the
file to determine which vignette engine to use, then the vignette engine
reads it (perhaps more than once).


> The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is
encoding = getOption("encoding"), which is "native.enc" on my system.
> 

It sounds as though the render function needs a way to determine the
encoding from the file itself.  Recent Sweave versions support the
declaration

%\VignetteEncoding{utf8}

as well as the older

\usepackage[utf8]{inputenc}

that you used.  You might want to try that line as well.  (You need to
keep the \usepackage line to tell LaTeX what encoding you're using.)

Duncan Murdoch


> I'll post the question on an RStudio forum as well.
> 
> Best regards,
> 
> Thierry
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
> 
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
> 
> The plural of anecdote is not data.
> ~ Roger Brinner
> 
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> 
> 
> -----Oorspronkelijk bericht-----
> Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
> Verzonden: dinsdag 9 december 2014 11:04
> Aan: ONKELINX, Thierry; r-devel at r-project.org
> Onderwerp: Re: [Rd] UTF8 markdown vignette
> 
> On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
>> Dear all,
>>
>> I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
> 
> You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum.
> 
> Duncan Murdoch
> 
>> Best regards,
>>
>> Thierry
>>
>> Details:
>>
>> Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
>>
>> The source packages is build using the devtools package. The build
>> command is R --vanilla CMD build  "myPackage" --no-manual
>> --no-resave-data
>>
>> The DESCRIPTION file has
>>
>> VignetteBuilder: knitr
>> Suggests: knitr
>> Imports: rmarkdown
>>
>> The markdown vignette YAML contains
>> vignette: >
>>   %\VignetteEngine{knitr::rmarkdown}
>>   %\VignetteIndexEntry{The title}
>>   \usepackage[utf8]{inputenc}
>>
>> The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
>>
>> The vignette in tar.gz passes R --vanilla CMD check  --timings
>> --as-cran
>>
>> * checking files in 'vignettes' ... OK
>> * checking for unstated dependencies in vignettes ... OK
>> * checking package vignettes in 'inst/doc' ... OK
>> * checking running R code from vignettes ...
>>    'markdown_intro.Rmd' using 'UTF-8' ... OK OK
>> * checking re-building of vignette outputs ... [22s] OK
>>
>>
>>
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>> Assurance Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>> + 32 2 525 02 51
>> + 32 54 43 61 85
>> Thierry.Onkelinx at inbo.be
>> www.inbo.be
>>
>> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
>> ~ Sir Ronald Aylmer Fisher
>>
>> The plural of anecdote is not data.
>> ~ Roger Brinner
>>
>> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>>
>>
>> Disclaimer Bezoek onze website / Visit our
>> website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inb
>> o>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
>



More information about the R-devel mailing list