[Rd] Support writing UTF-8 output in Windows

Ben Bolker bbolker at gmail.com
Sun Nov 10 00:58:50 CET 2013


Duncan Murdoch <murdoch.duncan <at> gmail.com> writes:

> 
> On 13-11-09 12:07 PM, Sverre Stausland wrote:
> > As recently discussed on Stack Overflow, R for Mac OS and Ubuntu (so
> > probably all Unix systems) can correctly write files with UTF-8
> > encoding, but R for Windows cannot:
> 
> That's not an accurate description of the problem.  Some functions in R 
> convert values to the native encoding, but not all do.
> 
> > http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r
> >
> > I strongly suggest that R for Windows should support this feature in
> > upcoming versions.
> 
> It's not trivial to do.  When R was written, and perhaps still on some 
> obscure platforms, there wasn't any way to do that--Windows didn't 
> support UTF-8 then, just Microsoft's version of UCS-2 and a variety of 
> other more limited encodings.  Unix platforms didn't support UCS-2.  So 
> internally R keeps many things in the native encoding.
> 
> If you decide to rewrite R from scratch now, I'd suggest that you handle 
> things differently.  If you'd rather not rewrite it yourself, then I 
> don't know how you will convince someone else to take on that job.
> 
> You might find it easier to convince Microsoft to add a UTF-8 locale, so 
> then the native encoding would be UTF-8, and the problem would go away.
> 
> Duncan Murdoch

  Would it be fairer / more productive to say/ask: 

* it would be nice if write.table could write files in UTF-8 encoding
* is there any documentation already available about which R functions
_do_ handle UTF-8 output on Windows, and how they do it?  
* could they be used as models for adapting write.table to write files
in UTF-8 encoding on Windows?

  i.e., instead of "convert R to output UTF-8 universally on Windows",
"figure out how to make write.table output UTF-8 on Windows, or
suggest a workaround" ?

  Ben Bolker



More information about the R-devel mailing list