[Rd] Native characterset is wrong for unicode builds for Windows
maillist at tlink.de
maillist at tlink.de
Thu Feb 26 21:09:03 CET 2015
When I send some outlandish characters through enc2native (or format) in
R 3.1.2 on Ubuntu trusty it works quite well:
> "®ØΔЊת"
[1] "®ØΔЊת"
> enc2native("®ØΔЊת")
[1] "®ØΔЊת"
> Encoding(enc2native("®ØΔЊת"))
[1] "UTF-8"
In Windows the result is different:
> "®ØΔЊת"
[1] "®ØΔЊת"
> enc2native("®ØΔЊת")
[1] "®Ø<U+0394><U+040A><U+05EA>"
> Encoding(enc2native("®ØΔЊת"))
[1] "latin1"
And this is wrong. The native character set of a unicode application
under Windows is *Unicode*. enc2native should do the same under Windows
as it does on Ubuntu. Also the "unknown" encoding should be changed to
mean the same as "UTF-8" exactly as it is on Linux.
More information about the R-devel
mailing list