[R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package
Schuhmacher, Dominic
dom|n|c@@chuhm@cher @end|ng |rom m@them@t|k@un|-goett|ngen@de
Sat May 20 18:01:47 CEST 2023
In case more people want to check this out, the minimal example package can be downloaded from https://owncloud.gwdg.de/index.php/s/ejhN3vd51572e27
Saying (in an UTF-8 locale)
R CMD check minimal_1.0.tar.gz
reproduces the described problem, whereas
PDFLATEX=lualatex \
RD2PDF_INPUTENC='inputenc}\usepackage{luatexja' \
R CMD check minimal_1.0.tar.gz
seems to run without problems and generates a reasonable pdf manual.
> On 18. May 2023, at 12:05, Uwe Ligges <ligges using statistik.tu-dortmund.de> wrote:
>
>
>
> On 18.05.2023 10:03, Martin Maechler wrote:
>>>>>>> Schuhmacher, Dominic
>>>>>>> on Wed, 17 May 2023 12:05:49 +0000 writes:
>> > Dear list, I have a package
>> > https://github.com/dschuhmacher/kanjistat whose very
>> > purpose depends on working with Japanese kanji characters
>> > (in UTF-8 encoding). Such characters appear vitally in the
>> > data sets, examples, tests, the vignette and the .Rd
>> > files.
>> > My package checks fine with devtools::check on my system
>> > and via Github Actions produced with
>> > usethis::use_github_action_check_standard(). However, I
>> > would like to release the package on CRAN, and running R
>> > CMD check --as-cran gives me a number of headaches, mainly
>> > related to the production of pdf documents via latex as it
>> > seems to be not so easy to convince latex to typeset
>> > Japanese, see
>> > https://www.overleaf.com/learn/latex/Japanese
>> > For the vignette, I can set in the Rmarkdown file
>> > pdf_document: latex_engine: lualatex includes: in_header:
>> > preamble.tex and in the file preamble.tex
>> > \usepackage{luatexja} \usepackage{microtype} This gives me
>> > a pdf-vignette that looks and checks fine (except that the
>> > abovementioned GitHub Actions don't seem to find lualatex,
>> > which is why the pdf output is commented out in the main
>> > branch on GitHub).
>> > Unfortunately, I fail to find a similar solution for the
>> > pdf manual. R CMD check yields
>> > --------------
>> > checking PDF version of manual ... WARNING LaTeX errors
>> > when creating PDF version. This typically indicates Rd
>> > problems. LaTeX errors found: ! Package inputenc Error:
>> > Unicode character 冷 (U+51B7) (inputenc) not set up for
>
>
> Can you send me a minimal example package with these characters in an Rd file?
>
> Best,
> Uwe Ligges
>
>
>> > use with LaTeX. [and many more of the same] * checking
>> > PDF version of manual without index ... ERROR
>> > --------------
>> > It seems that the pdf manual is generated by first
>> > producing a texinfo file and then running texi2dvi. From
>> > https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Inserting-Unicode.html
>> > I take the message that texinfo does not do Japanese... Is
>> > there any way to work around the use of texinfo and use
>> > lualatex (with a preamble) instead? If not, is there a way
>> > to keep the UTF-8 encoded characters in the html help (I
>> > think this is very useful for the user!) and still produce
>> > a pdf that passes the check, e.g. by replacing the kanji
>> > characters automatically by their codepoints (or even a
>> > generic placeholder symbol) when generating the pdf
>> > manual?
>> I cannot help much more,
>> but be assured that texinfo is *not* used in the process
>> It's just a "historical coincidence" that texi2dvi , a "simple"
>> shell script, typically comes from the texinfo ("software
>> package", i.e., in Linux distributions the texi2dvi command
>> (shell script, see above) is provided by the 'texinfo'
>> (Debian/Ubuntu/..) package
>> man texi2dvi tells you about a sleuth of environment variables,
>> notably PDFLATEX TEX etc and I guess you can just set one of
>> these to 'lualatex' .. .. and of course lualatex must be
>> findable on the CRAN servers but I'd bet that to be the case.
>> Best,
>> Martin
>> > Any thoughts and suggestions on this would be greatly
>> > appreciated! I think/hope then that the remaining problems
>> > in R CMD check are acceptable to the CRAN team given the
>> > nature of my package. They are:
>> > 1. Examples and tests fail if the check is not run in an
>> > UTF-8 locale.
>> > 2. checking data for non-ASCII characters ... NOTE Note:
>> > found 111752 marked UTF-8 strings
>> > Many thanks, Dominic Schuhmacher
>> > ______________________________________________
>> > R-package-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
More information about the R-package-devel
mailing list