[R] UTF-8 or Unicode on Windows PC

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Apr 21 19:09:06 CEST 2008

On Mon, 21 Apr 2008, Hans-Joerg Bibiko wrote:

> On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote:
>>> Is it possible to download a compiled snapshot of 2.7.0 for Windows XP?
>> Yes, http://cran.r-project.org/bin/windows/base/rtest.html
>> And it is due for release tomorrow.
> Many thanks! I can see the progress :)
> But please forgive my incompetence. I'm not so familiar with Windows.
> If I start e.g. RGUI by using: Rgui.exe LC_CTYPE=ja I can type Japanese, 
> Russian, and German. strsplit works perfectly! ;)
> But if I type for instance a German umlaut 'ü' it comes out as 'u'. OK, it is 
> due to the fact I didn't set up Rgui in UTF-8 mode.

Entering at the keyboard in more than one language is close to impossible 
(not quite, as 'Japanese' covers a few but you need a Japanese keyboard to 
do it).  You can't change the language of Windows just by setting locales.

> But how can I do this? My data are written in many different languages, and I 
> want to do some statistics.

You can read in files in known encodings, though.

> R version 2.7.0 RC (2008-04-19 r45391)
> i386-pc-mingw32
> locales:
> all to German_Germany.1252
> LC_CTYPE=Japanese_Japan.932
> ###
> There are some minor issues.
> I set Rgui's font to "Arial Unicode". This works but I have some troubles to 
> place my cursor, caused by the issue that Arial Unicode is not a monospaced 
> font.

Right, and you are warned not to do that.  You must use a fixed-width 
font, and for CJK characters, one in the standard single/double spacing.

(See for example the comments in Rconsole and rw-FAQ 3.5.  The GUI 
preferrences dialog only offers fixed-width fonts, so you have to work 
quite hard to do anything else.)

> If I start up Rgui in German, I can see the localized menu items, but for 
> each non-ASCII character I see cryptic things. It seems to me that the 
> localized strings are written in UTF-8, and Rgui expects ANSI characters.

Argh, yes, that was an error by the translator in marking the file -- 
thanks, I just have time to fix it.  (RGui does not expect ANSI, but all 
of R does expect translations to be in the encoding they are declared to 
be-- this eas declared as ISO-8859-1.)

> ###
> Nevertheless, thanks a lot!
> --Hans

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list