[Rd] proposal: 'dev.capabilities()' can also query Unicode capabilities of current graphics device

Paul Murrell p@u| @end|ng |rom @t@t@@uck|@nd@@c@nz
Tue Sep 26 01:39:57 CEST 2023


Hi

Yes, you can set up your own font and TeX installations are a good 
source of Type 1 fonts.  Here is an example (paths obviously specific to 
my [Ubuntu 20.04] OS and TeX installation) ...


cmlgc <- Type1Font("cmlgc",
 
rep("/usr/share/texlive/texmf-dist/fonts/afm/public/cm-lgc/fcmr6z.afm", 4),
                    encoding="Cyrillic")
pdfFonts(cmlgc=cmlgc)

x <- '\u410\u411\u412'
pdf("cmlgc.pdf", family="cmlgc", encoding="Cyrillic")
plot(1:10, main = x)
dev.off()

embedFonts("cmlgc.pdf", out="cmlgc-embed.pdf",
 
fontpaths="/usr/share/texlive/texmf-dist/fonts/type1/public/cm-lgc/")


Final result attached.

Thanks for the patch for the unrelated memory problem;  I will take a 
look at that.

Paul

On 24/09/23 09:43, Ivan Krylov wrote:
> On Wed, 20 Sep 2023 12:39:50 +0200
> Martin Maechler <maechler using stat.math.ethz.ch> wrote:
> 
>  > The problem is that some pdf *viewers*,
>  > notably `evince` on Fedora Linux, for several years now,
>  > do *not* show *some* of the UTF-8 glyphs because they do not use
>  > the correct fonts
> 
> One more problem that makes it nontrivial to use Unicode with pdf() is
> the graphics device not knowing some of the font metrics:
> 
> x <- '\u410\u411\u412'
> pdf()
> plot(1:10, main = x)
> # Warning messages:
> # 1: In title(...) : font width unknown for character 0xb0
> # 2: In title(...) : font width unknown for character 0xe4
> # 3: In title(...) : font width unknown for character 0xfc
> # 4: In title(...) : font width unknown for character 0x7f
> dev.off()
> 
> In the resulting PDF file, the three letters are visible, at least in
> Evince 3.38.2, but they are all positioned in the same space.
> 
> I understand that this is strictly speaking not pdf()'s fault
> (grDevices contains the font metrics for all standard Adobe fonts and a
> few more), but I'm not sure what to do as a user. Should I call
> pdfFonts(...), declaring a font with all symbols I need? Where does one
> even get Type-1 Cyrillic Helvetica (or any other font) with separate
> font metrics files for use with pdf()?
> 
> Actually, the wrong number of sometimes random character codes reminds
> me of stack garbage. In src/library/grDevices/src/devPS.c, function
> static double PostScriptStringWidth, there's this bit of code:
> 
> if(!strIsASCII((char *) str) &&
> /*
> * Every fifth font is a symbol font:
> * see postscriptFonts()
> */
> (face % 5) != 0) {
> R_CheckStack2(strlen((char *)str)+1);
> char buff[strlen((char *)str)+1];
> /* Output string cannot be longer */
> mbcsToSbcs((char *)str, buff, encoding, enc);
> str1 = (unsigned char *)buff;
> }
> 
> Later the characters in str1 are iterated over in order to calculate
> the total width of the string. I didn't notice this myself until I saw
> in the debugger that after a few iterations of the loop, the contents
> of str1 are completely different from the result of mbcsToSbcs((char
> *)str, buff, encoding, enc), and went to investigate. Only after the
> debugger told me that there's no variable called "buff" I realised that
> the VLA pointed to by str1 no longer exists.
> 
> --- src/library/grDevices/src/devPS.c (revision 85214)
> +++ src/library/grDevices/src/devPS.c (working copy)
> @@ -721,6 +721,8 @@
> unsigned char p1, p2;
> 
> int status;
> + /* May be about to allocate */
> + void *alloc = vmaxget();
> if(!metrics && (face % 5) != 0) {
> /* This is the CID font case, and should only happen for
> non-symbol fonts. So we assume monospaced with multipliers.
> @@ -755,9 +757,8 @@
> * Every fifth font is a symbol font:
> * see postscriptFonts()
> */
> - (face % 5) != 0) {
> - R_CheckStack2(strlen((char *)str)+1);
> - char buff[strlen((char *)str)+1];
> + (face % 5) != 0 && metrics) {
> + char *buff = R_alloc(strlen((char *)str)+1, 1);
> /* Output string cannot be longer */
> mbcsToSbcs((char *)str, buff, encoding, enc);
> str1 = (unsigned char *)buff;
> @@ -792,6 +793,7 @@
> }
> }
> }
> + vmaxset(alloc);
> return 0.001 * sum;
> }
> 
> 
> 
> After this patch, I'm consistently getting the right character codes in
> the warnings, but I still don't know how to set up the font metrics.
> 
> -- 
> Best regards,
> Ivan
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel 
> <https://stat.ethz.ch/mailman/listinfo/r-devel>

-- 
Dr Paul Murrell
Te Kura Tatauranga | Department of Statistics
Waipapa Taumata Rau | The University of Auckland
Private Bag 92019, Auckland 1142, New Zealand
64 9 3737599 x85392
paul using stat.auckland.ac.nz
www.stat.auckland.ac.nz/~paul/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cmlgc-embed.pdf
Type: application/pdf
Size: 10746 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20230926/c4365ddc/attachment.pdf>


More information about the R-devel mailing list