[Rd] RFC: Kerning, postscript() and pdf()

Mon Oct 13 09:00:13 CEST 2008

Thanks for the feedback.  Two comments

- we have experimental code for all the options, so the work in 
implementing them does not differ greatly.  The long-term maintenance 
costs of having different options is a real consideration, though.
R-devel currently has the options of B (default) and C, but is work in 
progress.

- I estimate that letter-by-letter placement increases the file size for 
text strings by a factor of 6, so this would be significant in a plot with 
a lot of annotations, but not one in which points were labelled by single 
letters.

On Sun, 12 Oct 2008, Duncan Murdoch wrote:

> On 12/10/2008 11:36 AM, Prof Brian Ripley wrote:
>> Ei-ji Nakama has pointed out (from another Japanese user, I believe) that 
>> postscript() and pdf() have not been handling kerning correctly, and this 
>> is a request for opinions about how we should correct it.
>> 
>> Kerning is the adjustment of the spacing between letters from their natural 
>> width, so that for example 'Yo' is usually typeset with the o closer to the 
>> Y than 'Yl' would be.  Kerning is not very well standardized, so that for 
>> example R's default Helvetica and its URW clone (Nimbus Sans) have quite 
>> different ideas of the amount of kerning corrections for 'Yo'. This 
>> matters, because not many people actually see Helvetica when viewing R's 
>> PostScript or PDF output, but rather a similar face like Nimbus Sans or 
>> Arial, or in the case of Acrobat Reader, a not very similar face.  Kerning 
>> is only a feature of some proportionally spaced fonts and so not of Courier 
>> nor CJK fonts.
>> 
>> The current position (R <= 2.8.0) is that string widths have been computing 
>> using kerning from the Adobe Font Metric files for the nominal font, but 
>> the strings have been displayed without using kerning (at least in the 
>> viewers we are aware of, and the PostScript and PDF reference manuals 
>> mandate that behaviour, if rather obscurely).  This means that in strings 
>> such as 'You', the width used in the string placement differs from that 
>> actually displayed.
>> 
>> For postscript(), this doesn't have much impact, as centring or right 
>> justification ('hadj' in text()) is done by PostScript code and computes 
>> the width from the actual font used (and so copes well with font 
>> substitution).  It might affect the fine layout in plotmath, but using 
>> strings which would be kerned in annotations is not common.
>> 
>> For pdf() the effect is more commonly seen, as all text is set 
>> left-justified, and the computed width is used to centre/right-justify.
>> 
>> There are several things we could do:
>> 
>> A.  Do nothing, for back compatibility.  After all, this has been going on 
>> for years and no one has complained until last month.
>> 
>> B.  Ignore kerning, and hence change the string width computations to match 
>> the current display.  This is more attractive than it appears at first 
>> sight -- as far as I know all other devices ignore kerning, and we are 
>> increasingly used to seeing 'typeset' output without kerning.  It would be 
>> desirable when copying graphs by e.g. dev.copy2eps from devices that do not 
>> kern.
>> 
>> C.  Insert kerning corrections by splitting up strings, so e.g. 'You' is 
>> set as (Y)-140 kc(ou): this is what TeX engines do.
>> 
>> D.  Compute the position of each letter in the string and place them 
>> individually.
>> 
>> C and D would give visually identical output when the font used is exactly 
>> as specified, and hopefully also when a substitute font is using with the 
>> same glyph widths (as substituting Nimbus Sans for Helvetica, at least for 
>> some versions of each), but where the substitute is a poor match, C ought 
>> to look more elegant but line up less well.  D would produce much larger 
>> files than C.
>> 
>> We do have the option of not changing the output when there is no kerning. 
>> That would be by far the most common case except that some fonts (including 
>> Helvetica but not Nimbus Sans) kern between punctuation and a space, e.g. 
>> ', '.  I'm inclined to believe that most uses of ',' in R graphical output 
>> are not punctuation (certainly true of R's own examples), and also that we 
>> nowadays do not expect to see kerning involving spaces.
>> 
>> Ei-ji Nakama provided an implementation of C for pdf() and D for 
>> postscript() (thanks Ei-ji, and apologies that we did not have a chance to 
>> discuss the principles first).  I'm inclined to suggest that we should go 
>> forwards with at most two of these alternatives, and those two should be 
>> the same for postscript() and pdf() -- my own inclination is to B and C.
>> 
>> So questions:
>> 
>> 1) Do people feel strongly that we should preserve graphical output from 
>> past versions of R, even when there are known bugs?  I can see the need to 
>> reproduce published figures, but normally this would also need using the 
>> same version of R.
>
>
> I think we can make this sort of change in 2.9.0.
>
>> 2) Is kerning worth pursuing?
>
>
> I think that is up to you and other people who might do the work; I don't 
> think I'll contribute to it.
>
>> 3) If so, is elegant looking output more important than exact layout?
>
>
> I suppose it matters how bad the exact layout looks, but I think your comment 
> above that exact layout will produce much larger files is of more concern. 
> We're sure to get complaints if "much larger" is noticeable.  Other concerns 
> are whether text searches in .pdf or .ps files get confused by the 
> difference.
>
>> 4) If we allow kerning, should it be the default (or only) option?
>
>
> If we do it, I think we should make it the default.  Whether it is optional 
> depends on how much work that would be (so it is mainly up to the 
> implementor).
>
>
>> 
>> To see that sometimes there can be a large effect, try in postscript() or 
>> pdf()
>> 
>> xx <- 'You You You You You You You You'
>> plot(0,0,xlim=c(0,1),ylim=c(0,1),type='n')
>> abline(v=0)
>> text(0, 0.5, xx, adj=0)
>> abline(v=strwidth(xx))
>> x2 <- strsplit(xx, "")
>> w <- sapply(x2, strwidth)
>> abline(v=sum(w))
>> 
>> The leftmost of the right pair of lines is the computed width, the 
>> rightmost the (normal) displayed width.
>> 
>> Unless there are cogent reasons to bring this forward to 2.8.1, any changes 
>> would be as from 2.9.0.
>> 
>> Brian Ripley
>> 
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595