[R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu May 18 10:03:13 CEST 2023
>>>>> Schuhmacher, Dominic
>>>>> on Wed, 17 May 2023 12:05:49 +0000 writes:
> Dear list, I have a package
> https://github.com/dschuhmacher/kanjistat whose very
> purpose depends on working with Japanese kanji characters
> (in UTF-8 encoding). Such characters appear vitally in the
> data sets, examples, tests, the vignette and the .Rd
> files.
> My package checks fine with devtools::check on my system
> and via Github Actions produced with
> usethis::use_github_action_check_standard(). However, I
> would like to release the package on CRAN, and running R
> CMD check --as-cran gives me a number of headaches, mainly
> related to the production of pdf documents via latex as it
> seems to be not so easy to convince latex to typeset
> Japanese, see
> https://www.overleaf.com/learn/latex/Japanese
> For the vignette, I can set in the Rmarkdown file
> pdf_document: latex_engine: lualatex includes: in_header:
> preamble.tex and in the file preamble.tex
> \usepackage{luatexja} \usepackage{microtype} This gives me
> a pdf-vignette that looks and checks fine (except that the
> abovementioned GitHub Actions don't seem to find lualatex,
> which is why the pdf output is commented out in the main
> branch on GitHub).
> Unfortunately, I fail to find a similar solution for the
> pdf manual. R CMD check yields
> --------------
> checking PDF version of manual ... WARNING LaTeX errors
> when creating PDF version. This typically indicates Rd
> problems. LaTeX errors found: ! Package inputenc Error:
> Unicode character 冷 (U+51B7) (inputenc) not set up for
> use with LaTeX. [and many more of the same] * checking
> PDF version of manual without index ... ERROR
> --------------
> It seems that the pdf manual is generated by first
> producing a texinfo file and then running texi2dvi. From
> https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Inserting-Unicode.html
> I take the message that texinfo does not do Japanese... Is
> there any way to work around the use of texinfo and use
> lualatex (with a preamble) instead? If not, is there a way
> to keep the UTF-8 encoded characters in the html help (I
> think this is very useful for the user!) and still produce
> a pdf that passes the check, e.g. by replacing the kanji
> characters automatically by their codepoints (or even a
> generic placeholder symbol) when generating the pdf
> manual?
I cannot help much more,
but be assured that texinfo is *not* used in the process
It's just a "historical coincidence" that texi2dvi , a "simple"
shell script, typically comes from the texinfo ("software
package", i.e., in Linux distributions the texi2dvi command
(shell script, see above) is provided by the 'texinfo'
(Debian/Ubuntu/..) package
man texi2dvi tells you about a sleuth of environment variables,
notably PDFLATEX TEX etc and I guess you can just set one of
these to 'lualatex' .. .. and of course lualatex must be
findable on the CRAN servers but I'd bet that to be the case.
Best,
Martin
> Any thoughts and suggestions on this would be greatly
> appreciated! I think/hope then that the remaining problems
> in R CMD check are acceptable to the CRAN team given the
> nature of my package. They are:
> 1. Examples and tests fail if the check is not run in an
> UTF-8 locale.
> 2. checking data for non-ASCII characters ... NOTE Note:
> found 111752 marked UTF-8 strings
> Many thanks, Dominic Schuhmacher
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
More information about the R-package-devel
mailing list