[R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

Sat May 20 18:01:47 CEST 2023

In case more people want to check this out, the minimal example package can be downloaded from https://owncloud.gwdg.de/index.php/s/ejhN3vd51572e27

Saying (in an UTF-8 locale)

R CMD check minimal_1.0.tar.gz

reproduces the described problem, whereas

PDFLATEX=lualatex \
RD2PDF_INPUTENC='inputenc}\usepackage{luatexja' \
R CMD check minimal_1.0.tar.gz

seems to run without problems and generates a reasonable pdf manual.

> On 18. May 2023, at 12:05, Uwe Ligges <ligges using statistik.tu-dortmund.de> wrote:
> 
> 
> 
> On 18.05.2023 10:03, Martin Maechler wrote:
>>>>>>> Schuhmacher, Dominic
>>>>>>>     on Wed, 17 May 2023 12:05:49 +0000 writes:
>>     > Dear list, I have a package
>>     > https://github.com/dschuhmacher/kanjistat whose very
>>     > purpose depends on working with Japanese kanji characters
>>     > (in UTF-8 encoding). Such characters appear vitally in the
>>     > data sets, examples, tests, the vignette and the .Rd
>>     > files.
>>     > My package checks fine with devtools::check on my system
>>     > and via Github Actions produced with
>>     > usethis::use_github_action_check_standard().  However, I
>>     > would like to release the package on CRAN, and running R
>>     > CMD check --as-cran gives me a number of headaches, mainly
>>     > related to the production of pdf documents via latex as it
>>     > seems to be not so easy to convince latex to typeset
>>     > Japanese, see
>>     > https://www.overleaf.com/learn/latex/Japanese
>>     > For the vignette, I can set in the Rmarkdown file
>>     > pdf_document: latex_engine: lualatex includes: in_header:
>>     > preamble.tex and in the file preamble.tex
>>     > \usepackage{luatexja} \usepackage{microtype} This gives me
>>     > a pdf-vignette that looks and checks fine (except that the
>>     > abovementioned GitHub Actions don't seem to find lualatex,
>>     > which is why the pdf output is commented out in the main
>>     > branch on GitHub).
>>     > Unfortunately, I fail to find a similar solution for the
>>     > pdf manual. R CMD check yields
>>     > --------------
>>     > checking PDF version of manual ... WARNING LaTeX errors
>>     > when creating PDF version.  This typically indicates Rd
>>     > problems.  LaTeX errors found: ! Package inputenc Error:
>>     > Unicode character 冷 (U+51B7) (inputenc) not set up for
> 
> 
> Can you send me a minimal example package with these characters in an Rd file?
> 
> Best,
> Uwe Ligges
> 
> 
>>     > use with LaTeX.  [and many more of the same] * checking
>>     > PDF version of manual without index ... ERROR
>>     > --------------
>>     > It seems that the pdf manual is generated by first
>>     > producing a texinfo file and then running texi2dvi. From
>>     > https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Inserting-Unicode.html
>>     > I take the message that texinfo does not do Japanese... Is
>>     > there any way to work around the use of texinfo and use
>>     > lualatex (with a preamble) instead? If not, is there a way
>>     > to keep the UTF-8 encoded characters in the html help (I
>>     > think this is very useful for the user!) and still produce
>>     > a pdf that passes the check, e.g. by replacing the kanji
>>     > characters automatically by their codepoints (or even a
>>     > generic placeholder symbol) when generating the pdf
>>     > manual?
>> I cannot help much more,
>> but be assured that  texinfo is *not* used in the process
>> It's just a "historical coincidence"  that  texi2dvi , a "simple"
>> shell script, typically comes from the texinfo ("software
>> package", i.e., in Linux distributions the texi2dvi command
>> (shell script, see above) is provided by the 'texinfo'
>> (Debian/Ubuntu/..) package
>> man texi2dvi  tells you about a sleuth of environment variables,
>> notably  PDFLATEX  TEX etc and I guess you can just set one of
>> these to 'lualatex' .. .. and of course lualatex must be
>> findable on the CRAN servers but I'd bet that to be the case.
>> Best,
>> Martin
>>     > Any thoughts and suggestions on this would be greatly
>>     > appreciated! I think/hope then that the remaining problems
>>     > in R CMD check are acceptable to the CRAN team given the
>>     > nature of my package. They are:
>>     > 1. Examples and tests fail if the check is not run in an
>>     > UTF-8 locale.
>>     > 2. checking data for non-ASCII characters ... NOTE Note:
>>     > found 111752 marked UTF-8 strings
>>     > Many thanks, Dominic Schuhmacher
>>     > ______________________________________________
>>     > R-package-devel using r-project.org mailing list
>>     > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel