[Rd] UTF8 markdown vignette

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Fri Dec 19 22:37:39 CET 2014


Dear Duncan and Yihui,

I was able to test it with the new R-devel version. Adding only %\SweaveUTF8 to the vignette works (= passes R CMD CHECK --as-cran and UTF-8 characters render as they should). Adding only Encoding: UTF-8 to the DESCRIPTION instead of %\SweaveUTF8 works too.

I have tested the same things with the github version of knitr on R-3.1.2-patched. Adding Encoding: UTF-8 to the DESCRIPTION gives an R CMD check --as-cran warning: * checking package vignettes in 'inst/doc' ... WARNING   Non-ASCII package vignette without specified encoding: 'utf8vignette.Rmd' The UTF-8 characters in the vignette are none the less rendered correctly.
Adding only \%SweaveUTF8 to the vignette makes it passing R CMD Check --as-cran and the UTF-8 characters are rendered correctly.

So both the changes to R-devel and knitr seems to work fine.

Thanks a lot.

Thierry

PS I've added the sessionInfo() of both configurations.

#sessionInfo() of R-devel
> library(rmarkdown)
> library(knitr)
> sessionInfo()
R Under development (unstable) (2014-12-18 r67185)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Dutch_Belgium.1252  LC_CTYPE=Dutch_Belgium.1252
[3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] knitr_1.8        rmarkdown_0.3.11

loaded via a namespace (and not attached):
[1] digest_0.6.4    evaluate_0.5.5  formatR_1.0     htmltools_0.2.6 stringr_0.6.2
[6] tools_3.2.0

> library(knitr)
> library(rmarkdown)
> sessionInfo()
R version 3.1.2 Patched (2014-12-11 r67166)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Dutch_Belgium.1252  LC_CTYPE=Dutch_Belgium.1252
[3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rmarkdown_0.3.11 knitr_1.8.6

loaded via a namespace (and not attached):
 [1] bitops_1.0-6    devtools_1.6.1  digest_0.6.6    evaluate_0.5.5  formatR_1.0
 [6] htmltools_0.2.6 httr_0.6.0      RCurl_1.95-4.5  stringr_0.6.2   tools_3.1.2
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

________________________________________
Van: Duncan Murdoch [murdoch.duncan at gmail.com]
Verzonden: vrijdag 19 december 2014 14:02
Aan: Yihui Xie
CC: ONKELINX, Thierry; r-devel at r-project.org; Kurt Hornik
Onderwerp: Re: [Rd] UTF8 markdown vignette

On 18/12/2014, 12:17 AM, Yihui Xie wrote:
> For the record, I saw a change had been made in R-devel:
> https://github.com/wch/r-source/commit/d53b098 (Thanks, Duncan)
> Meanwhile, I also made a change in knitr to assume UTF-8 unless R
> passes an encoding to the vignette engine:
> https://github.com/yihui/knitr/commit/23c6c8e2 Both will solve the
> original problem, but apparently the former one is the ideal fix.

The Windows builds of R-devel were stalled for a few days, but I've
given them a kick now, so this should appear in the Windows binaries on
CRAN soon.

Duncan Murdoch

>
> Regards,
> Yihui
> --
> Yihui Xie <xieyihui at gmail.com>
> Web: http://yihui.name
>
>
> On Wed, Dec 10, 2014 at 6:19 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
>> On 09/12/2014, 10:36 PM, Yihui Xie wrote:
>>> I took a look at the R source and I realized that the encoding was
>>> actually never passed to the vignette engine:
>>> https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507
>>> Apparently only the file and quiet arguments are passed to the
>>> vignette engine. Did I miss anything?
>>
>> I think it's actually a little messier than that:  sometimes the
>> encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD
>> check), but not always.  Here's what I think should happen instead:
>>
>> When building a vignette in a package, R knows the encoding declared for
>> the package, so it should assume this as the default for the vignette.
>> If nothing is declared, it should assume "native.enc", i.e. whatever is
>> the native encoding on the machine it's running on.
>>
>> For each vignette, at the same time as it determines the vignette
>> engine, it should see whether there is a declared encoding within the
>> vignette.
>>
>> When it calls the engine, it should pass an encoding (and it should be a
>> legal one, e.g. UTF-8, not utf8).
>>
>> Unless I notice something missing when I do this, or someone else tells
>> me something that's missing, I'll try to make the changes above in
>> R-devel and R-patched sometime before 3.1.3 is released.
>>
>> In the meantime, unless declaring a dependence on R >= 3.1.3, vignette
>> engines should determine the encoding themselves whenever they are
>> called without an "encoding" argument.
>>
>> Duncan Murdoch

Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>



More information about the R-devel mailing list