[Rd] Correct usage of nchar(): precautionary change for R 2.6.0
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue May 29 11:39:11 CEST 2007
Remember that nchar() returns by default the number of *bytes* and not the
number of characters. I've recently spotted many cases in which nchar()
has been used with substr() which works in characters; this can lead to
incorrect results. (This seems the commonest use of nchar() in
packages.)
There were two reasons why nchar() was left defaulting to bytes when we
allowed MBCSs in R:
1) Many of the uses are of the form if(nchar(x)) or if(nchar(x)==0) or
even if nchar(x) != 0. Computing the length of a string is an inefficient
way to find out if it is non-empty, especially if it has to be converted
to wchars to do so.
2) Once you allow multibyte characters, not all character strings are
valid and for those nchar(x, "c") is NA. Not much code has been written
to take into account the possibility that nchar() might return an NA.
Despite these reasons, it seems that the dangers of incorrect use outweigh
them. So for 2.6.0
- There is a new function nzchar() which provided a quick test of non-zero
number of characters.
- The default becomes nchar(type="chars").
It seems that nchar() is used quite often to lay out 'printed' or
graphical output. For that, normally nchar(type="width") is what is
needed.
None of this is an issue in single-byte locales or for ASCII text in
UTF-8 or the Windows' CJK locales, but please bear in mind that you cannot
assume such for a public package. (The assumption that ASCII code is
represented in single bytes is pretty widespread, but at some point we may
want to support Windows' native UCS-2 encoding for which it is not true.)
The best advice is to use the 'type' argument for all uses of nchar() in
public code unless perhaps you are sure only ASCII data will ever be
encountered.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list