[Rd] UTF8 markdown vignette

Duncan Murdoch murdoch.duncan at gmail.com
Tue Dec 9 18:18:16 CET 2014


On 09/12/2014 11:13 AM, Yihui Xie wrote:
> A few things to clarify:
>
> 1. You do not necessarily have to keep the \usepackage{} line if you
> use %\VignetteEncoding{UTF-8}, because Pandoc will use UTF-8 anyway in
> its LaTeX template.
>
> 2. Perhaps the vignette engine in R has done something clever to
> convert utf8 to UTF-8, but I'd recommend %\VignetteEncoding{UTF-8}
> instead of %\VignetteEncoding{utf8} to make sure it is a valid
> encoding name, e.g.
>
> > 'utf8' %in% iconvlist()
> [1] FALSE
> > 'UTF-8' %in% iconvlist()
> [1] TRUE
> > 'UTF8' %in% iconvlist()
> [1] TRUE
>
> BTW, %\VignetteEncoding is not documented anywhere (Cc'ing Kurt), and
> I think it needs to be documented, since the old approach
> \usepackage[enc]{inputenc} was basically a hack, which looks really
> odd in non-LaTeX vignettes (e.g. HTML vignettes).

  Yes, "utf8" works; it will be sent to the vignette engine as "UTF-8".

I was surprised about the missing docs.  The documented way to do this 
is to use

%\SweaveUTF8

but the source says the recommended way is to use

%\VignetteEncoding{}

and it's certainly a little more engine-agnostic.  I'll add something to the docs if Kurt doesn't get there first.

>
> 3. The default `encoding` argument of rmarkdown::render() is not
> relevant here, even if its value is native.enc. When R build a
> vignette, it tries to detect its encoding and pass it to the vignette
> engine, so the default argument value may not be native.enc.
>
> Lastly, the most important piece of information is missing in this
> post: library(rmarkdown); sessionInfo(). There is not a minimal
> reproducible example, either. Without these information, I can only
> guess blindly.
>
> BTW, you may also try HTML vignettes instead, which is much much
> easier to get right than LaTeX in terms of character encodings.

Over the last while I've been writing an HTML vignette, and I really 
want to compliment Yihui and the other rmarkdown folks for doing a 
fantastic job with them.  I haven't had to deal with encoding issues, 
but overall markdown + R + HTML is a very pleasant way to work.   I just 
wish someone would implement reverse search ... :-).

Duncan Murdoch
>
> Regards,
> Yihui
> --
> Yihui Xie <xieyihui at gmail.com>
> Web: http://yihui.name
>
>
> On Tue, Dec 9, 2014 at 7:05 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> > On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:
> >> Dear Duncan,
> >>
> >> The UTF-8 characters aren't properly rendered in the pdf version of the vignette.
> >> $£€ âêîûô äëïöüÿ áéíóúý àèìòù ãñ çµ is rendered as $£€ âêîûô äëïöüÿ áéà óúý à èìòù ãñçµ
> >
> > That looks as though the UTF-8 characters are being interpreted as
> > Latin1 characters (or whatever your native encoding is on Windows) when
> > read from the file.
> >
> > It is quite tricky to work with UTF-8 in R in Windows.  I think Sweave
> > does it properly, though there may be exceptions.  The issue is that
> > many character input routines assume characters start out in the native
> > encoding.  (There's also a translation that happens by default on
> > output, but I don't think that's your problem.)  So the way to debug
> > this is to follow all of the I/O, and see where the misinterpretation
> > happens.  For vignettes, things are complicated, because R reads the
> > file to determine which vignette engine to use, then the vignette engine
> > reads it (perhaps more than once).
> >
> >
> >> The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is
> > encoding = getOption("encoding"), which is "native.enc" on my system.
> >>
> >
> > It sounds as though the render function needs a way to determine the
> > encoding from the file itself.  Recent Sweave versions support the
> > declaration
> >
> > %\VignetteEncoding{utf8}
> >
> > as well as the older
> >
> > \usepackage[utf8]{inputenc}
> >
> > that you used.  You might want to try that line as well.  (You need to
> > keep the \usepackage line to tell LaTeX what encoding you're using.)
> >
> > Duncan Murdoch
> >
> >
> >> I'll post the question on an RStudio forum as well.
> >>
> >> Best regards,
> >>
> >> Thierry
> >>
> >>
> >> -----Oorspronkelijk bericht-----
> >> Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
> >> Verzonden: dinsdag 9 december 2014 11:04
> >> Aan: ONKELINX, Thierry; r-devel at r-project.org
> >> Onderwerp: Re: [Rd] UTF8 markdown vignette
> >>
> >> On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
> >>> Dear all,
> >>>
> >>> I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
> >>
> >> You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum.
> >>
> >> Duncan Murdoch
> >>
> >>> Best regards,
> >>>
> >>> Thierry
> >>>
> >>> Details:
> >>>
> >>> Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
> >>>
> >>> The source packages is build using the devtools package. The build
> >>> command is R --vanilla CMD build  "myPackage" --no-manual
> >>> --no-resave-data
> >>>
> >>> The DESCRIPTION file has
> >>>
> >>> VignetteBuilder: knitr
> >>> Suggests: knitr
> >>> Imports: rmarkdown
> >>>
> >>> The markdown vignette YAML contains
> >>> vignette: >
> >>>   %\VignetteEngine{knitr::rmarkdown}
> >>>   %\VignetteIndexEntry{The title}
> >>>   \usepackage[utf8]{inputenc}
> >>>
> >>> The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
> >>>
> >>> The vignette in tar.gz passes R --vanilla CMD check  --timings
> >>> --as-cran
> >>>
> >>> * checking files in 'vignettes' ... OK
> >>> * checking for unstated dependencies in vignettes ... OK
> >>> * checking package vignettes in 'inst/doc' ... OK
> >>> * checking running R code from vignettes ...
> >>>    'markdown_intro.Rmd' using 'UTF-8' ... OK OK
> >>> * checking re-building of vignette outputs ... [22s] OK



More information about the R-devel mailing list