[Rd] \U with more than 4 digits returns the wrong character

Mark van der Loo mark.vanderloo at gmail.com
Thu Dec 4 20:24:58 CET 2014


Richie,

The R language definition [1] says (10.3.1):

\Unnnnnnnn \U{nnnnnnnn}
(where multibyte locales are supported and not on Windows, otherwise
an error). Unicode character with given hex code – sequences of up to
eight hex digits.


Best,
Mark

[1] http://cran.r-project.org/doc/manuals/r-release/R-lang.html
http://www.markvanderloo.eu
-------------------------------------------------------------------
If you cannot quantify it,
you don't know what you're talking about


On Thu, Dec 4, 2014 at 8:00 PM, Richard Cotton <richierocks at gmail.com> wrote:
> If I type a character using \U syntax that has more than 4 digits, I
> get the wrong character.  For example,
>
> "\U1d4d0"
>
> should print a mathematical bold script capital A.  See
> http://www.fileformat.info/info/unicode/char/1d4d0/index.htm
>
> On my machine, it prints the Hangul character corresponding to
>
> "\Ud4d0"
> http://www.fileformat.info/info/unicode/char/d4d0/index.htm
>
> It seems that the hex-digit part is overflowing at 16^4.
>
> I tested this on R3.1.2 and devel (2014-12-03 r67101) x64 under
> Windows.  I played around with Sys.setlocale and options("encoding"),
> but couldn't get the expected value.
>
> Can others reproduce this?  It feels like a bug, but experience tells
> me I probably have something silly going on with my setup.
>
> --
> Regards,
> Richie
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list