[Rd] RFC: Kerning, postscript() and pdf()
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Oct 13 09:00:13 CEST 2008
Thanks for the feedback. Two comments
- we have experimental code for all the options, so the work in
implementing them does not differ greatly. The long-term maintenance
costs of having different options is a real consideration, though.
R-devel currently has the options of B (default) and C, but is work in
progress.
- I estimate that letter-by-letter placement increases the file size for
text strings by a factor of 6, so this would be significant in a plot with
a lot of annotations, but not one in which points were labelled by single
letters.
On Sun, 12 Oct 2008, Duncan Murdoch wrote:
> On 12/10/2008 11:36 AM, Prof Brian Ripley wrote:
>> Ei-ji Nakama has pointed out (from another Japanese user, I believe) that
>> postscript() and pdf() have not been handling kerning correctly, and this
>> is a request for opinions about how we should correct it.
>>
>> Kerning is the adjustment of the spacing between letters from their natural
>> width, so that for example 'Yo' is usually typeset with the o closer to the
>> Y than 'Yl' would be. Kerning is not very well standardized, so that for
>> example R's default Helvetica and its URW clone (Nimbus Sans) have quite
>> different ideas of the amount of kerning corrections for 'Yo'. This
>> matters, because not many people actually see Helvetica when viewing R's
>> PostScript or PDF output, but rather a similar face like Nimbus Sans or
>> Arial, or in the case of Acrobat Reader, a not very similar face. Kerning
>> is only a feature of some proportionally spaced fonts and so not of Courier
>> nor CJK fonts.
>>
>> The current position (R <= 2.8.0) is that string widths have been computing
>> using kerning from the Adobe Font Metric files for the nominal font, but
>> the strings have been displayed without using kerning (at least in the
>> viewers we are aware of, and the PostScript and PDF reference manuals
>> mandate that behaviour, if rather obscurely). This means that in strings
>> such as 'You', the width used in the string placement differs from that
>> actually displayed.
>>
>> For postscript(), this doesn't have much impact, as centring or right
>> justification ('hadj' in text()) is done by PostScript code and computes
>> the width from the actual font used (and so copes well with font
>> substitution). It might affect the fine layout in plotmath, but using
>> strings which would be kerned in annotations is not common.
>>
>> For pdf() the effect is more commonly seen, as all text is set
>> left-justified, and the computed width is used to centre/right-justify.
>>
>> There are several things we could do:
>>
>> A. Do nothing, for back compatibility. After all, this has been going on
>> for years and no one has complained until last month.
>>
>> B. Ignore kerning, and hence change the string width computations to match
>> the current display. This is more attractive than it appears at first
>> sight -- as far as I know all other devices ignore kerning, and we are
>> increasingly used to seeing 'typeset' output without kerning. It would be
>> desirable when copying graphs by e.g. dev.copy2eps from devices that do not
>> kern.
>>
>> C. Insert kerning corrections by splitting up strings, so e.g. 'You' is
>> set as (Y)-140 kc(ou): this is what TeX engines do.
>>
>> D. Compute the position of each letter in the string and place them
>> individually.
>>
>> C and D would give visually identical output when the font used is exactly
>> as specified, and hopefully also when a substitute font is using with the
>> same glyph widths (as substituting Nimbus Sans for Helvetica, at least for
>> some versions of each), but where the substitute is a poor match, C ought
>> to look more elegant but line up less well. D would produce much larger
>> files than C.
>>
>> We do have the option of not changing the output when there is no kerning.
>> That would be by far the most common case except that some fonts (including
>> Helvetica but not Nimbus Sans) kern between punctuation and a space, e.g.
>> ', '. I'm inclined to believe that most uses of ',' in R graphical output
>> are not punctuation (certainly true of R's own examples), and also that we
>> nowadays do not expect to see kerning involving spaces.
>>
>> Ei-ji Nakama provided an implementation of C for pdf() and D for
>> postscript() (thanks Ei-ji, and apologies that we did not have a chance to
>> discuss the principles first). I'm inclined to suggest that we should go
>> forwards with at most two of these alternatives, and those two should be
>> the same for postscript() and pdf() -- my own inclination is to B and C.
>>
>> So questions:
>>
>> 1) Do people feel strongly that we should preserve graphical output from
>> past versions of R, even when there are known bugs? I can see the need to
>> reproduce published figures, but normally this would also need using the
>> same version of R.
>
>
> I think we can make this sort of change in 2.9.0.
>
>> 2) Is kerning worth pursuing?
>
>
> I think that is up to you and other people who might do the work; I don't
> think I'll contribute to it.
>
>> 3) If so, is elegant looking output more important than exact layout?
>
>
> I suppose it matters how bad the exact layout looks, but I think your comment
> above that exact layout will produce much larger files is of more concern.
> We're sure to get complaints if "much larger" is noticeable. Other concerns
> are whether text searches in .pdf or .ps files get confused by the
> difference.
>
>> 4) If we allow kerning, should it be the default (or only) option?
>
>
> If we do it, I think we should make it the default. Whether it is optional
> depends on how much work that would be (so it is mainly up to the
> implementor).
>
>
>>
>> To see that sometimes there can be a large effect, try in postscript() or
>> pdf()
>>
>> xx <- 'You You You You You You You You'
>> plot(0,0,xlim=c(0,1),ylim=c(0,1),type='n')
>> abline(v=0)
>> text(0, 0.5, xx, adj=0)
>> abline(v=strwidth(xx))
>> x2 <- strsplit(xx, "")
>> w <- sapply(x2, strwidth)
>> abline(v=sum(w))
>>
>> The leftmost of the right pair of lines is the computed width, the
>> rightmost the (normal) displayed width.
>>
>> Unless there are cogent reasons to bring this forward to 2.8.1, any changes
>> would be as from 2.9.0.
>>
>> Brian Ripley
>>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list