[Rd] Native characterset is wrong for unicode builds for Windows

maillist at tlink.de maillist at tlink.de
Thu Feb 26 21:09:03 CET 2015


When I send some outlandish characters through enc2native (or format) in 
R 3.1.2 on Ubuntu trusty it works quite well:

 > "®ØΔЊת"
[1] "®ØΔЊת"
 > enc2native("®ØΔЊת")
[1] "®ØΔЊת"
 > Encoding(enc2native("®ØΔЊת"))
[1] "UTF-8"

In Windows the result is different:

 > "®ØΔЊת"
[1] "®ØΔЊת"
 > enc2native("®ØΔЊת")
[1] "®Ø<U+0394><U+040A><U+05EA>"
 > Encoding(enc2native("®ØΔЊת"))
[1] "latin1"

And this is wrong. The native character set of a unicode application 
under Windows is *Unicode*. enc2native should do the same under Windows 
as it does on Ubuntu. Also the "unknown" encoding should be changed to 
mean the same as "UTF-8" exactly as it is on Linux.



More information about the R-devel mailing list