[Rd] R-devel news: non-ASCII character strings in packages

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Feb 16 08:26:32 CET 2007


R-devel (pre-2.5.0) now has enough facilities to allow packages with 
non-ASCII character strings to work reasonably well in locales where the 
fonts use support the characters used.  For example, names in Western 
European languages can be used on both Latin-1 (and hence Windows 1252) 
and UTF-8 systems.  It should also be possible to make use of non-ASCII 
object names.

To enable this, two things need to be done.

1) The package encoding needs to be declared in the DESCRIPTION file.
2) Any character strings stored in .rda files need to be marked as Latin-1 
or UTF-8 (see 'Writing R Extensions' for how to do so).

R CMD check will give NOTE or WARNING messages when it detects non-ASCII 
characters.

Please do bear in mind the caveat in the first paragraph: it is very 
unlikely that using French in a Chinese locale or v.v. will work correctly 
(even on a UTF-8 system).

The changes needed are backwards compatible: if you make them to your 
package, it will work equally well (or badly) in R < 2.5.0, and better in 
2.5.0 when released.

Currently one CRAN package has non-ASCII object names and fifteen have 
non-ASCII data (as detected by R CMD check).

Note that non-ASCII data need not be from non-English languages: Windows 
1252 in particular has a variety of signs that are far from portable, most 
notably its misnamed 'smart quotes' (but also the Euro).

Finally, please do not add Encoding: to the DESCRIPTION of ASCII-only 
packages.  It just slows things down and (unless latin1 is specified) 
restricts the package to only systems supporting iconv.  (Yes, there are 
examples of this.)


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list