[Rd] proposal: 'dev.capabilities()' can also query Unicode capabilities of current graphics device

Paul Murrell p@u| @end|ng |rom @t@t@@uck|@nd@@c@nz
Thu Sep 21 04:23:22 CEST 2023


Hi

The problem is what "supports UNICODE" means.
Graphics devices have a 'hasTextUTF8' boolean to indicate that ...

     /* Some devices can plot UTF-8 text directly without converting
        to the native encoding, e.g. windows(), quartz() ....

        If this flag is true, all text *not in the symbol font* is sent
        in UTF8 to the textUTF8/strWidthUTF8 entry points.

... and this is TRUE for the pdf() device for example.
It is also TRUE for Cairo devices, but the support is quite different 
(as your examples demonstrate).
The Cairo devices do not alter UTF8 text at all, but the pdf() device 
attempts to convert to a single-byte representation, which of course 
will not always work.
The situation is only made more complex with the recent dev->glyph() 
support because that offers another possible route to producing generic 
UNICODE characters, including on pdf() devices.

Paul

On 21/09/23 04:12, Trevor Davis wrote:
>  > However, pdf() *does* support Unicode.
> 
> When I run a simple Unicode example like:
> 
> ```
> f <- tempfile(fileext = ".pdf")
> pdf(f)
> # U+2655 ♥ is found in most (all?) "sans" fonts like Arial, Dejavu Sans,
> Arimo, etc.
> # However, it is not in the Latin-1 encoding
> grid::grid.text("\u2665")
> dev.off()
> ```
> 
> I observe the following output:
> 
> ```
> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y,
> :
> conversion failure on '♥' in 'mbcsToSbcs': dot substituted for <e2>
> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y,
> :
> conversion failure on '♥' in 'mbcsToSbcs': dot substituted for <99>
> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y,
> :
> conversion failure on '♥' in 'mbcsToSbcs': dot substituted for <a5>
> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y,
> :
> conversion failure on '♥' in 'mbcsToSbcs': dot substituted for <e2>
> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y,
> :
> conversion failure on '♥' in 'mbcsToSbcs': dot substituted for <99>
> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y,
> :
> conversion failure on '♥' in 'mbcsToSbcs': dot substituted for <a5>
> ```
> 
> When I open up the pdf file I just see three dots and not a heart as I
> expected even if I open it up with `xpdf`.
> 
> In contrast the pdf generated by `cairo_pdf()` has a heart without
> generating any warnings.
> 
> Avoiding such WARNINGs on certain CRAN check machines when I have a Unicode
> graphics example that is worth including in a package's examples (if
> protected by an appropriate if statement) is my main use case for such a
> new feature. However, a new feature like `dev.capabilities()$unicode`
> could certainly return something more sophisticated than a crude `TRUE` and
> `FALSE` to distinguish between levels of Unicode support provided by
> different graphics devices.
> 
> Thanks,
> 
> Trevor
> 
> On Wed, Sep 20, 2023 at 3:39 AM Martin Maechler <maechler using stat.math.ethz.ch>
> wrote:
> 
>  > >>>>> Trevor Davis
>  > >>>>> on Thu, 31 Aug 2023 13:49:03 -0700 writes:
>  >
>  > > Hi,
>  >
>  > > It would be nice if `grDevices::dev.capabilities()` could also be
>  > used to
>  > > query whether the current graphics device supports Unicode. In such
>  > a case
>  > > I'd expect it to return `FALSE` if `pdf()` is the current graphics
>  > device
>  > > and something else for the Cairo or Quartz devices.
>  >
>  > > Thanks,
>  > > Trevor
>  >
>  > I agree in principle that this would be useful new feature for
>  > dev.capabilities()
>  >
>  > However, pdf() *does* support Unicode.
>  >
>  > The problem is that some pdf *viewers*,
>  > notably `evince` on Fedora Linux, for several years now,
>  > do *not* show *some* of the UTF-8 glyphs because they do not use
>  > the correct fonts {which *are* on the machine; good old `xpdf`
>  > does in that case show the glyphs}.
>  >
>  > Martin
>  >
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel 
> <https://stat.ethz.ch/mailman/listinfo/r-devel>

-- 
Dr Paul Murrell
Te Kura Tatauranga | Department of Statistics
Waipapa Taumata Rau | The University of Auckland
Private Bag 92019, Auckland 1142, New Zealand
64 9 3737599 x85392
paul using stat.auckland.ac.nz
www.stat.auckland.ac.nz/~paul/



More information about the R-devel mailing list