[Rd] R-devel news: non-ASCII character strings in packages
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Feb 16 08:26:32 CET 2007
R-devel (pre-2.5.0) now has enough facilities to allow packages with
non-ASCII character strings to work reasonably well in locales where the
fonts use support the characters used. For example, names in Western
European languages can be used on both Latin-1 (and hence Windows 1252)
and UTF-8 systems. It should also be possible to make use of non-ASCII
object names.
To enable this, two things need to be done.
1) The package encoding needs to be declared in the DESCRIPTION file.
2) Any character strings stored in .rda files need to be marked as Latin-1
or UTF-8 (see 'Writing R Extensions' for how to do so).
R CMD check will give NOTE or WARNING messages when it detects non-ASCII
characters.
Please do bear in mind the caveat in the first paragraph: it is very
unlikely that using French in a Chinese locale or v.v. will work correctly
(even on a UTF-8 system).
The changes needed are backwards compatible: if you make them to your
package, it will work equally well (or badly) in R < 2.5.0, and better in
2.5.0 when released.
Currently one CRAN package has non-ASCII object names and fifteen have
non-ASCII data (as detected by R CMD check).
Note that non-ASCII data need not be from non-English languages: Windows
1252 in particular has a variety of signs that are far from portable, most
notably its misnamed 'smart quotes' (but also the Euro).
Finally, please do not add Encoding: to the DESCRIPTION of ASCII-only
packages. It just slows things down and (unless latin1 is specified)
restricts the package to only systems supporting iconv. (Yes, there are
examples of this.)
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list