[Rd] Native characterset is wrong for unicode builds for Windows
Duncan Murdoch
murdoch.duncan at gmail.com
Thu Feb 26 23:22:56 CET 2015
On 26/02/2015 3:09 PM, maillist at tlink.de wrote:
>
> When I send some outlandish characters through enc2native (or format) in
> R 3.1.2 on Ubuntu trusty it works quite well:
>
> > "®ØΔЊת"
> [1] "®ØΔЊת"
> > enc2native("®ØΔЊת")
> [1] "®ØΔЊת"
> > Encoding(enc2native("®ØΔЊת"))
> [1] "UTF-8"
>
> In Windows the result is different:
>
> > "®ØΔЊת"
> [1] "®ØΔЊת"
> > enc2native("®ØΔЊת")
> [1] "®Ø<U+0394><U+040A><U+05EA>"
> > Encoding(enc2native("®ØΔЊת"))
> [1] "latin1"
>
> And this is wrong. The native character set of a unicode application
> under Windows is *Unicode*. enc2native should do the same under Windows
> as it does on Ubuntu. Also the "unknown" encoding should be changed to
> mean the same as "UTF-8" exactly as it is on Linux.
What is a "unicode application", and what makes you think R is one? R
is being told by Windows that your native encoding is latin1. Perhaps
Windows 8 supports UTF-8 as a native encoding (I've never used it), but
previous versions of Windows didn't.
Duncan Murdoch
More information about the R-devel
mailing list