[Rd] Native characterset is wrong for unicode builds for Windows

Duncan Murdoch murdoch.duncan at gmail.com
Thu Feb 26 23:22:56 CET 2015


On 26/02/2015 3:09 PM, maillist at tlink.de wrote:
> 
> When I send some outlandish characters through enc2native (or format) in 
> R 3.1.2 on Ubuntu trusty it works quite well:
> 
>  > "®ØΔЊת"
> [1] "®ØΔЊת"
>  > enc2native("®ØΔЊת")
> [1] "®ØΔЊת"
>  > Encoding(enc2native("®ØΔЊת"))
> [1] "UTF-8"
> 
> In Windows the result is different:
> 
>  > "®ØΔЊת"
> [1] "®ØΔЊת"
>  > enc2native("®ØΔЊת")
> [1] "®Ø<U+0394><U+040A><U+05EA>"
>  > Encoding(enc2native("®ØΔЊת"))
> [1] "latin1"
> 
> And this is wrong. The native character set of a unicode application 
> under Windows is *Unicode*. enc2native should do the same under Windows 
> as it does on Ubuntu. Also the "unknown" encoding should be changed to 
> mean the same as "UTF-8" exactly as it is on Linux.

What is a "unicode application", and what makes you think R is one?  R
is being told by Windows that your native encoding is latin1.  Perhaps
Windows 8 supports UTF-8 as a native encoding (I've never used it), but
previous versions of Windows didn't.

Duncan Murdoch



More information about the R-devel mailing list