[Rd] Suggestion / patch to support more Unicode characters in R CMD Rd2pdf

Mikko Korpela mikko.korpela at aalto.fi
Wed Jul 4 23:01:42 CEST 2012


Hi list,

When using R CMD Rd2pdf, it is possible to set environment variable 
RD2PDF_INPUTENC to value "inputenx" and enjoy better support for UTF-8 
characters (see ?Rd2pdf). This enables LaTeX package "inputenx" instead 
of "inputenc".

Even better support for UTF-8 encoded characters can be had by better 
using the facilities provided by inputenx and making R CMD Rd2pdf insert 
a line to its temporary .tex file: "\input{ix-utf8enc.dfu}". The 
instructions are found in section 1.2 "Unicode" of the inputenx manual: 
http://mirror.ctan.org/macros/latex/contrib/oberdiek/inputenx.pdf

I suggest that R CMD Rd2pdf automatically insert 
"\input{ix-utf8enc.dfu}" to its temporary .tex file when a combination 
of inputenx and UTF-8 is detected. The attached small patch does that.

A demo package is also attached (tarball built manually, not R CMD 
build). It uses some UTF-8 characters not supported without the patch: R 
CMD Rd2pdf gives an error, propagated from LaTeX. With the patch 
installed, R CMD Rd2pdf works OK when RD2PDF_INPUTENC=inputenx is set. 
For testing, unpack tarball and run R CMD Rd2pdf on the resulting 
directory. Tested on R development version r59731 running on Ubuntu 
10.10 64 bit.

-- 
Mikko Korpela
Aalto University School of Science
Department of Information and Computer Science


-------------- next part --------------
A non-text attachment was scrubbed...
Name: encTest3.tar.gz
Type: application/x-gzip
Size: 2429 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120705/5e61ce90/attachment.gz>
-------------- next part --------------
Index: src/library/tools/R/Rd2pdf.R
===================================================================
--- src/library/tools/R/Rd2pdf.R	(revision 59731)
+++ src/library/tools/R/Rd2pdf.R	(working copy)
@@ -466,12 +466,17 @@
     inputenc <- Sys.getenv("RD2PDF_INPUTENC", "inputenc")
     ## this needs to be canonical, e.g. 'utf8'
     ## trailer is for detection if we want to edit it later.
+    latex_outputEncoding <- latex_canonical_encoding(outputEncoding)
     setEncoding <-
         paste("\\usepackage[",
-              latex_canonical_encoding(outputEncoding), "]{",
+              latex_outputEncoding, "]{",
               inputenc, "} % @SET ENCODING@", sep="")
     useGraphicx <- "% \\usepackage{graphicx} % @USE GRAPHICX@"
     writeLines(c(setEncoding,
+                 if (inputenc == "inputenx" &&
+                     latex_outputEncoding == "utf8") {
+                     "\\input{ix-utf8enc.dfu}"
+                 },
     		 useGraphicx,
                  if (index) "\\makeindex{}",
                  "\\begin{document}"), out)
@@ -545,21 +550,28 @@
     latexEncodings <- unique(latexEncodings)
     latexEncodings <- latexEncodings[!is.na(latexEncodings)]
     cyrillic <- if (nzchar(Sys.getenv("_R_CYRILLIC_TEX_"))) "utf8" %in% latexEncodings else FALSE
-    latex_outputEncoding <- latex_canonical_encoding(outputEncoding)
     encs <- latexEncodings[latexEncodings != latex_outputEncoding]
     if (length(encs) || hasFigures || cyrillic) {
         lines <- readLines(outfile)
+        moreUnicode <- inputenc == "inputenx" && "utf8" %in% encs
 	encs <- paste(encs, latex_outputEncoding, collapse=",", sep=",")
 
 	if (!cyrillic) {
-	    lines[lines == setEncoding] <-
+	    setEncoding2 <-
 		paste0("\\usepackage[", encs, "]{", inputenc, "}")
 	} else {
-	    lines[lines == setEncoding] <-
+	    setEncoding2 <-
 		paste(
 "\\usepackage[", encs, "]{", inputenc, "}
 \\IfFileExists{t2aenc.def}{\\usepackage[T2A]{fontenc}}{}", sep = "")
 	}
+	if (moreUnicode) {
+	    setEncoding2 <-
+		paste0(
+setEncoding2, "
+\\input{ix-utf8enc.dfu}")
+        }
+        lines[lines == setEncoding] <- setEncoding2
 	if (hasFigures)
 	    lines[lines == useGraphicx] <- "\\usepackage{graphicx}\\setkeys{Gin}{width=0.7\\textwidth}"
 	writeLines(lines, outfile)


More information about the R-devel mailing list