[R] Unicode characters (R 2.7.0 on Windows XP SP3 and Hardy Heron)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat May 31 00:11:51 CEST 2008
On Fri, 30 May 2008, Duncan Murdoch wrote:
> On 5/30/2008 4:12 PM, Hans-Joerg Bibiko wrote:
>> Quoting Duncan Murdoch <murdoch at stats.uwo.ca>:
>>
>>> On 5/30/2008 12:58 PM, Hans-Jörg Bibiko wrote:
>>>> to put it simply. Windows cannot handle utf-8 data. There is no utf-8
>>>> locale available.
>>>
>>> Code page 65001 is utf-8. Most text editors (including Notepad)
>>> include an option to save in the UTF-8 encoding.
>>>
>>> Some programs don't fully support utf-8 (some don't even support the
>>> native UCS-2), but most don't care. That's the nice thing about utf-8.
>>>
>>> So in what sense can Windows not handle utf-8 data?
>>
>> Of course, you're right. I only meant in that context R for Windows, not
>> Windows at all. Sorry for my incorrectness.
>
> But I think with Brian Ripley's work over the last while, R for Windows
> actually handles utf-8 pretty well. (It might not guess at that encoding,
> but if you tell it that's what you're using...)
UTF-8, please (only the capitalized form is correct).
R passes around, prints and plots UTF-8 character data pretty well, but it
translates to the native encoding for almost all character-level
manipulations (and not just on Windows). ?Encoding spells out the
exceptions (and I think the original poster had not read it). As time
goes on we may add more, but it is really tedious (and somewhat
error-prone) to have multiple paths through the code for different
encodings (and different OSes do handle these differently -- Windows' use
of UTF-16 means that one character may not be one wchar_t).
A couple of the other points in the original posting were corrected in
R-patched just after release.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list