[R] Unicode characters (R 2.7.0 on Windows XP SP3 and Hardy Heron)

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat May 31 00:11:51 CEST 2008


On Fri, 30 May 2008, Duncan Murdoch wrote:

> On 5/30/2008 4:12 PM, Hans-Joerg Bibiko wrote:
>> Quoting Duncan Murdoch <murdoch at stats.uwo.ca>:
>> 
>>> On 5/30/2008 12:58 PM, Hans-Jörg Bibiko wrote:
>>>> to put it simply. Windows cannot handle utf-8 data. There is no   utf-8 
>>>> locale available.
>>> 
>>> Code page 65001 is utf-8.  Most text editors (including Notepad)
>>> include an option to save in the UTF-8 encoding.
>>> 
>>> Some programs don't fully support utf-8 (some don't even support the
>>> native UCS-2), but most don't care.  That's the nice thing about utf-8.
>>> 
>>> So in what sense can Windows not handle utf-8 data?
>> 
>> Of course, you're right. I only meant in that context R for Windows,  not 
>> Windows at all. Sorry for my incorrectness.
>
> But I think with Brian Ripley's work over the last while, R for Windows 
> actually handles utf-8 pretty well.  (It might not guess at that encoding, 
> but if you tell it that's what you're using...)

UTF-8, please (only the capitalized form is correct).

R passes around, prints and plots UTF-8 character data pretty well, but it 
translates to the native encoding for almost all character-level 
manipulations (and not just on Windows).  ?Encoding spells out the 
exceptions (and I think the original poster had not read it).  As time 
goes on we may add more, but it is really tedious (and somewhat 
error-prone) to have multiple paths through the code for different 
encodings (and different OSes do handle these differently -- Windows' use 
of UTF-16 means that one character may not be one wchar_t).

A couple of the other points in the original posting were corrected in 
R-patched just after release.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list