[R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Thu May 18 13:21:53 CEST 2023


В Wed, 17 May 2023 12:05:49 +0000
"Schuhmacher, Dominic"
<dominic.schuhmacher using mathematik.uni-goettingen.de> пишет:

> checking PDF version of manual ... WARNING
> LaTeX errors when creating PDF version.
> This typically indicates Rd problems.
> LaTeX errors found:
> ! Package inputenc Error: Unicode character 冷 (U+51B7)
> (inputenc) not set up for use with LaTeX.

I see you'd like to use Kanji characters in your R documentation (not
only a vignette). There are some workarounds for Cyrillic alphabets
(that work if you set a special environment variable), but quite a lot
more hurdles will need to be traversed for CJK support, and I'm not
sure that CRAN will accept the result even if you overcome them on your
own machine.

1. You might need to switch the LaTeX engine from the default of
pdflatex. (XeLaTeX in particular seems to have much better Unicode
support.) Both the texi2dvi shell script and R's emulation of it
understand the PDFLATEX environment variable (thank you Martin for
mentioning this!), but I'm not sure there is a way to require an
environment variable to be set for all invocations of R CMD INSTALL.
Anyway, as Overleaf says, pdflatex can support CJK, but in a less
convenient manner.

2. For pdflatex, it's possible to use \usepackage{CJKutf8}. The
required Debian packages are latex-cjk-japanese-wadalab (fonts) and
latex-cjk-common (CJKutf8.sty itself). There's no way to require these
packages to be installed on machines where your package's PDF
documentation might be built.

3. Once the packages are installed and you can compile an example *.tex
file containing Kanji, it's time to get R's PDF documentation system to
use these packages. You need to insert \usepackage{CJKutf8} in the
document's preamble (which is too late for Rd \out{} markup). I don't
see a way to convince Rd2pdf to do that, but there's a terrible hack to
do that using a LaTeX injection from an undocumented environment
variable.

4. All uses of CJK characters need to be wrapped in
\begin{CJK}{utf8}{min} ... \end{CJK}. Thankfully, this at least can be
achieved in Rd using \if{latex}{\out{\begin{CJK}{utf8}{min}}} and can
be wrapped in an Rd macro using \newcommand in man/macros/whatever.Rd.

Unfortunately, I couldn't find a way to wrap the \examples{} section in
\begin{CJK}...\end{CJK}, so CJK characters cannot be used there.

To summarise, the Rd file from
<https://paste.debian.net/hidden/f5baacd9/> can be compiled using the
following command line on a computer with CJKutf8.sty and wadalab fonts
installed:

RD2PDF_INPUTENC='inputenc}\usepackage{CJKutf8' \
 R CMD Rd2pdf foo.Rd

...but it's such a fragile tower of hacks that I wouldn't use it in an
actual package.

What about switching to XeLaTeX? PDFLATEX=xelatex R CMD Rd2pdf bar.Rd
doesn't crash, but doesn't show CJK characters either, because it's not
told which CJK font to use. (Besides, Rd.sty seems to set fonts in ways
that XeLaTeX doesn't quite understand.) \setCJKmainfont{...} is again a
preamble command, which again requires the terrible hack from (3), and
I don't see a way to use \fontspec{} without altering the preamble.

-- 
Best regards,
Ivan



More information about the R-package-devel mailing list