[Rd] nchar reporting wrong width when zero-space character is present?

Mark van der Loo mark.vanderloo at gmail.com
Wed Nov 19 10:58:41 CET 2014


Dear list,

If I include the zero-width non-breaking space (\ufeff) in a string,
nchar seems to compute the wrong number of columns used by 'cat'.

> x <- "f\ufeffoo"
> x
[1] "foo"
> nchar(x,type="width")
[1] 2

I would expect "3" here. Going through the documentation of 'Encoding'
and 'encodeString', I don't think this is expected behavior. Am I
missing something? If it is a bug I will file a report.

Secondly, the documentation of 'nchars' states that with type='chars'
(the default) it returns "the number of human-readable characters". I
get:

> nchar(x,type='chars')
[1] 4

I would hardly call the zero-width space human-readable. Also, since for example

> nchar("foo\r")
[1] 4

it is probably more accurate to say that the number of symbols
(abstract characters) are counted, noting that some of the symbols in
an alphabet represented by an encoding may be invisible (or hardly
visible).


Much thanks in advance,
Best, Mark


> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.2



More information about the R-devel mailing list