[R-SIG-Mac] font encoding issue

Simon Urbanek simon.urbanek at math.uni-augsburg.de
Wed Nov 24 21:10:38 CET 2004


On Nov 23, 2004, at 9:17 PM, Denis Chabot wrote:

> I suppose this is a font encoding error. Can it be fixed, or is there 
> something in R itself which prevents it from even displaying such 
> characters?

It's a bug and a feature of the R GUI ;).
Internally, R GUI uses UTF-8 encoding for text handing, including the 
editor. The idea was to have a localized GUI with support for any 
language and UTF-8 is the natively supported format in Cocoa. To make 
the mess even bigger, there was a bug in the GUI that converted the 
UTF-8 to vanilla C string at one point, thus resulting in the wrong 
behavior you spotted.

Now I have fixed that latter bug, such that your comments should appear 
undistorted now:

 > # exemple à suivre

If this is all you need, get tonight's nightly build.
However, using UTF-8 in strings in R is not that easy. Even if all you 
want is to retain the UTF-8 contents (i.e. tell R to not worry about 
the encoding and just print back what it gets), the actual problem is 
that R escapes certain characters, regardless of the locale:

 > Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/C"
 > "Müll"
[1] "M\303\274ll"

This means that the don't-worry-concept doesn't work. The latest info 
on encodings and UTF-8 I could find was for 1.8.1, but I suspect that 
nothing changed since: basically R has no UTF-8 support and there will 
be none unless someone with enough time, energy and skill will take up 
the task.

The bottom line is that I'll try to fix the GUI in a sense that it will 
use the locale-specific encoding in its internal representation and for 
all communication with R. The drawback will be that users on systems 
with different locales won't be able to use each other's files 
transparently. Still, this should fix things for users of more simple 
encodings (such as Latin1), but for more general support of UTF-8 or 
other multi-character encodings we will have to wait until there is a 
global solution in R.

Cheers,
Simon



More information about the R-SIG-Mac mailing list