[Rd] Encoding errors in Rd files

Tue Jul 24 22:46:54 CEST 2012

On 24/07/2012 21:08, steven mosher wrote:
> Well, I'm working on project trying to bring back an old package last
> published on R 1.9 back to life.
> I'm almost there but I am getting killed by an encoding error in the Rd
> files
>
> After reading the manual, I decided to try UTF-8.  Mostly because I could
> spell it. ha.
>
> That got me a bit closer but I still have these warnings
>
> * checking data for non-ASCII characters ... WARNING
>    Warning: found non-ASCII string(s)
>    'Tourbihre de la Rivihre-aux-Feu' in object 'modpoll'
>    'Lac ` la Fourche' in object 'modpoll'
>    'Lac ` la Loutre' in object 'modpoll'
>    'Lac Kinogami' in object 'modpoll'

How to handle those is in 'Writing R Extensions': basically convert to 
UTF-8 and mark them as UTF-8.

> * checking data for ASCII and uncompressed saves ... OK
> * checking examples ... OK
> * checking PDF version of manual ... WARNING
> LaTeX errors when creating PDF version.
> This typically indicates Rd problems.
> LaTeX errors found:
>   ! Package inputenc Error: Keyboard character used is undefined
> (inputenc)                in inputencoding `utf8'.
>
> I'll keep searching the help list archives for a clue, but If somebody
> could point me at educational material it's really time
> that I learn this aspect.

Without the actual file we can do little.  The message means that 
something in the manual inputs (and it could be the DESCRIPTION file or 
an Rd file) contains a character not known to LaTeX.  Most likely it is 
simply not a UTF-8 character, but it could also be outside LaTeX's gamut.

Normally the LaTeX log (which is in the check output) is more revealing: 
you can also try this part alone with R CMD Rd2pdf (and R CMD Rd2pdf 
--no-description often points the finger at the DESCRIPTION file).

>
> I've read    http://developer.r-project.org/Encodings_and_R.html
>
> How do I figure out which encoding to use with the error seen above

Assuming this is not something esoteric, UTF-8 is the most comprehensive 
choice, but LaTeX's UTF-8 coverage (and that of the fonts used) is 
heavily biased to Western European scripts.  So for example for 
Lithuanian you may want to choose something else (Latin-7?).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595