[Rd] Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
Ray Donnelly
rdonne||y @end|ng |rom @n@cond@@com
Fri Sep 13 14:01:03 CEST 2019
On Fri, Sep 13, 2019 at 1:46 PM Tomas Kalibera <tomas.kalibera using gmail.com>
wrote:
> On 9/13/19 1:33 PM, Ray Donnelly wrote:
>
> On Fri, Sep 13, 2019 at 11:53 AM Tomas Kalibera <tomas.kalibera using gmail.com>
> wrote:
>
>> On 9/13/19 11:37 AM, IAGO GINÉ VÁZQUEZ wrote:
>> > But if I type
>> > >"會"
>> > the output is
>> > [1] "會"
>> > so seemingly it can be represented. Or, am I wrong?
>>
>> In RGui you can print the string, because RGui is a Windows Unicode
>> application (uses UTF16-LE and bypasses the C runtime for strings). But
>> it is just the gui, R itself (and hence also packages) use the current
>> native encoding as defined by the C runtime. RGui will make sure R gets
>> the string in UTF-8, but as soon as you do anything even slightly
>> non-trivial, which includes formatting, the string will be converted to
>> the current native encoding. Some R functions allow you to do certain
>> things in UTF-8 without conversion to native encoding, you'd have to
>> read very carefully the documentation for each function - but for
>> practical use, you either need to live with the misinterpretation of
>> some characters, or use Windows in the locale where your characters can
>> be represented (e.g. Chinese locale when working with Chinese strings),
>> or use Linux/maOS. On Linux/macOS the current native encoding can be
>> UTF-8, so there is no problem. On Windows, with the current toolchain
>> based on mingw, this is not possible.
>>
>
> mingw-w64 is capable of processing utf-8 (it can process bytes after all).
> Can you explain what you mean here? Would any other compiler on Windows not
> suffer from this problem?
>
> The problem is using UTF-8 as the current locale as understood by the C
> runtime/C library. By default mingw uses msvcrt, which does not allow UTF-8
> as current locale (via setlocale()). Now mingw also allows to build with
> UCRT (recently), and I hope one day we will be able to use it, but it is
> not yet the default, msys2 does not use it yet for its mingw_ packages and
> we need also the external packages . Note that R (CRAN, and also BIOC)
> provide binary versions of all packages for Windows, they need to build
> them and they need all library dependencies. All of those would have to be
> rebuilt with UCRT, which will be a huge task. Fixing R on its own to
> support UTF-8 natively on Windows when the C runtime allows it won't be
> hard, because R already can do it on Unix, but the problem is all the
> dependencies.
>
Thanks. We build R for the Anaconda Distribution and are considering our
options around our Windows compilers, including the UCRT (and clang,
possibly from MSYS2, possibly from conda-forge, or a hybrid of some sort if
necessary).
> Tomas
>
>
>
>
>
>>
>>
>> Best
>> Tomas
>>
>> >
>> > Best
>> > Iago
>> > ------------------------------------------------------------------------
>> > *De:* Tomas Kalibera <tomas.kalibera using gmail.com>
>> > *Enviat el:* divendres, 13 de setembre de 2019 11:24
>> > *Per a:* IAGO GINÉ VÁZQUEZ <i.gine using pssjd.org>; r-devel using r-project.org
>> > <r-devel using r-project.org>
>> > *Tema:* Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2
>> > -windows 10
>> > On 9/13/19 11:01 AM, IAGO GINÉ VÁZQUEZ wrote:
>> > > I have a chinese character on a data frame, but the output of
>> > printing it is its UTF-8 code. Concretely, the character is 會 and the
>> > code is U+6703. Following the code I arrive to the instruction
>> > >
>> > >> base::format.default("會")
>> > > which prints
>> > >
>> > > [1] "<U+6703>"
>> > >
>> > > I do not know which is the extent of this behaviour either if it
>> > follows on most recent versions of R.
>> > >
>> > > Is it expected?
>> >
>> > If you are running this on Windows in an encoding where the character
>> > cannot be represented (e.g. non-Chinese locale), then yes, this is
>> > expected behavior.
>> >
>> > On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the
>> > character will be formatted/displayed properly.
>> >
>> > Best
>> > Tomas
>> >
>> > >
>> > > Thank you!
>> > >
>> > > Iago
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-devel using r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> >
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list