[R-SIG-Mac] font encoding issue

Denis Chabot chabotd at globetrotter.net
Wed Dec 1 09:07:36 CET 2004


Hi Simon,
Le 25 nov. 2004, à 12:02, r-sig-mac-request at stat.math.ethz.ch a écrit :

> From: Simon Urbanek <simon.urbanek at math.uni-augsburg.de>
> Subject: Re: [R-SIG-Mac] font encoding issue
> To: Denis Chabot <chabotd at globetrotter.net>
> Cc: r-sig-mac at stat.math.ethz.ch
>

> On Nov 23, 2004, at 9:17 PM, Denis Chabot wrote:
>
>> I suppose this is a font encoding error. Can it be fixed, or is there
>> something in R itself which prevents it from even displaying such
>> characters?
>
> It's a bug and a feature of the R GUI ;).
> Internally, R GUI uses UTF-8 encoding for text handing, including the
> editor. The idea was to have a localized GUI with support for any
> language and UTF-8 is the natively supported format in Cocoa. To make
> the mess even bigger, there was a bug in the GUI that converted the
> UTF-8 to vanilla C string at one point, thus resulting in the wrong
> behavior you spotted.
>
> Now I have fixed that latter bug, such that your comments should appear
> undistorted now:
>
>> # exemple à suivre
>
> If this is all you need, get tonight's nightly build.

I just installed this versions: 1.02-pre build 786
I still do not get accents to survive a trip to R console.
Worse, I cannot open R program (script) files that are saved inside a 
folder with accents in its name. For instance I had a test program 
inside a folder called "Étude des GAMs" and I was not able to open it. 
Renaming the folder "Etude des GAMs" allowed me to open program files 
inside it. A few tests suggest that an accent anywhere on the complete 
path will prevent R from opening the file. This is rather disturbing to 
a francophone user who might have plenty of accents all over the place 
on the hard disk.

I will admit I have not tried yet to manipulate the "locale" 
information like another user suggested on this list. After reading 
your message I thought I did not need to.


> However, using UTF-8 in strings in R is not that easy. Even if all you
> want is to retain the UTF-8 contents (i.e. tell R to not worry about
> the encoding and just print back what it gets), the actual problem is
> that R escapes certain characters, regardless of the locale:
>
>> Sys.getlocale()
> [1] "en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/C"

I get
 > Sys.getlocale()
[1] "C"

I'll read about this "locale" stuff...

>> "Müll"
> [1] "M\303\274ll"
>
> This means that the don't-worry-concept doesn't work. The latest info
> on encodings and UTF-8 I could find was for 1.8.1, but I suspect that
> nothing changed since: basically R has no UTF-8 support and there will
> be none unless someone with enough time, energy and skill will take up
> the task.
>
> The bottom line is that I'll try to fix the GUI in a sense that it will
> use the locale-specific encoding in its internal representation and for
> all communication with R. The drawback will be that users on systems
> with different locales won't be able to use each other's files
> transparently. Still, this should fix things for users of more simple
> encodings (such as Latin1), but for more general support of UTF-8 or
> other multi-character encodings we will have to wait until there is a
> global solution in R.
>
> Cheers,
> Simon
>

Sincerely,

Denis Chabot



More information about the R-SIG-Mac mailing list