[Rd] Support writing UTF-8 output in Windows

Duncan Murdoch murdoch.duncan at gmail.com
Sat Nov 9 21:04:03 CET 2013


On 13-11-09 12:07 PM, Sverre Stausland wrote:
> As recently discussed on Stack Overflow, R for Mac OS and Ubuntu (so
> probably all Unix systems) can correctly write files with UTF-8
> encoding, but R for Windows cannot:

That's not an accurate description of the problem.  Some functions in R 
convert values to the native encoding, but not all do.

> http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r
>
> I strongly suggest that R for Windows should support this feature in
> upcoming versions.

It's not trivial to do.  When R was written, and perhaps still on some 
obscure platforms, there wasn't any way to do that--Windows didn't 
support UTF-8 then, just Microsoft's version of UCS-2 and a variety of 
other more limited encodings.  Unix platforms didn't support UCS-2.  So 
internally R keeps many things in the native encoding.

If you decide to rewrite R from scratch now, I'd suggest that you handle 
things differently.  If you'd rather not rewrite it yourself, then I 
don't know how you will convince someone else to take on that job.

You might find it easier to convince Microsoft to add a UTF-8 locale, so 
then the native encoding would be UTF-8, and the problem would go away.

Duncan Murdoch



More information about the R-devel mailing list