[Rd] Support writing UTF-8 output in Windows

Duncan Murdoch murdoch.duncan at gmail.com
Sun Nov 10 13:49:22 CET 2013


On 13-11-10 7:31 AM, Sverre Stausland wrote:
> My e-mail was intended as a typical "feature request", and I couldn't
> find any more suitable place for that than the r-devel mailing list. I
> am not a programmer, so I don't have the skills to write this into R's
> source code myself.
>
> The incentive is nevertheless clear enough. I believe a software
> program in 2013 which imports, manipulates, and exports text in
> various formats (text files, picture files, postscript files, etc.)
> would normally be expected to support UTF-8. It might not be trivial
> to implement as R is written now, but the expectation will still be
> there. So I still believe it would be a good idea if R soon would be
> able to support UTF-8.

R does support UTF-8.  It all works smoothly in a UTF-8 locale, not so 
smoothly if you have your computer set up to use a different 8 bit encoding.
>
> I'm not quite able to piece together from the information you gave
> what the underlying issues are. What I read is:
> (1) Some R functions convert characters to the native encoding.
> (2) Windows did not support UTF-8 when R was first written.
> (3) Unix did not support UCS-2 when R was first written.
>
> I'm guessing here that the implications are:
> (1) R's write.table() converts characters to a native encoding.
> (2) The native encoding in Windows 7 is not UTF-8.
> (3) The native encoding in Unix systems is UTF-8.

You got it right for the first 4.  Regarding (2) in your second list, 
that's right, and in fact UTF-8 is not supported as a native encoding.
And point (3) is optional, though UTF-8 is the dominant encoding nowadays.

The easiest solution is for you to switch to a Unix variant and set it 
up to use UTF-8 as the native encoding.

Next easiest would be for Microsoft to add UTF-8 as a code page.

Most difficult would be for R to handle UTF-8 properly on systems with 
limited support for it.

We probably will add small changes that let you work around the Windows 
problems, but they won't be very satisfactory to anyone.  I don't think 
we will make the big changes that would make R look like "a software 
program in 2013", since it would be so much work, and there's such an 
easy workaround.

Duncan Murdoch

> But this is just guesswork.


>
> PS. A related issue:
> http://stackoverflow.com/questions/19881553/using-unicode-inside-rs-expression-command
>
> Sverre
>



More information about the R-devel mailing list