[Rd] nchar reporting wrong width when zero-space character is present?

Wed Nov 19 10:58:41 CET 2014

Dear list,

If I include the zero-width non-breaking space (\ufeff) in a string,
nchar seems to compute the wrong number of columns used by 'cat'.

> x <- "f\ufeffoo"
> x
[1] "foo"
> nchar(x,type="width")
[1] 2

I would expect "3" here. Going through the documentation of 'Encoding'
and 'encodeString', I don't think this is expected behavior. Am I
missing something? If it is a bug I will file a report.

Secondly, the documentation of 'nchars' states that with type='chars'
(the default) it returns "the number of human-readable characters". I
get:

> nchar(x,type='chars')
[1] 4

I would hardly call the zero-width space human-readable. Also, since for example

> nchar("foo\r")
[1] 4

it is probably more accurate to say that the number of symbols
(abstract characters) are counted, noting that some of the symbols in
an alphabet represented by an encoding may be invisible (or hardly
visible).

Much thanks in advance,
Best, Mark

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.2