[Rd] Printing chinese characters (UTF-8) on R 3.5.2 -windows 10

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Fri Sep 13 11:53:14 CEST 2019


On 9/13/19 11:37 AM, IAGO GINÉ VÁZQUEZ wrote:
> But if I type
> >"會"
> the output is
> [1] "會"
> so seemingly it can be represented. Or, am I wrong?

In RGui you can print the string, because RGui is a Windows Unicode 
application (uses UTF16-LE and bypasses the C runtime for strings). But 
it is just the gui, R itself (and hence also packages) use the current 
native encoding as defined by the C runtime. RGui will make sure R gets 
the string in UTF-8, but as soon as you do anything even slightly 
non-trivial, which includes formatting, the string will be converted to 
the current native encoding. Some R functions allow you to do certain 
things in UTF-8 without conversion to native encoding, you'd have to 
read very carefully the documentation for each function - but for 
practical use, you either need to live with the misinterpretation of 
some characters, or use Windows in the locale where your characters can 
be represented (e.g. Chinese locale when working with Chinese strings), 
or use Linux/maOS. On Linux/macOS the current native encoding can be 
UTF-8, so there is no problem. On Windows, with the current toolchain 
based on mingw, this is not possible.


Best
Tomas

>
> Best
> Iago
> ------------------------------------------------------------------------
> *De:* Tomas Kalibera <tomas.kalibera using gmail.com>
> *Enviat el:* divendres, 13 de setembre de 2019 11:24
> *Per a:* IAGO GINÉ VÁZQUEZ <i.gine using pssjd.org>; r-devel using r-project.org 
> <r-devel using r-project.org>
> *Tema:* Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2 
> -windows 10
> On 9/13/19 11:01 AM, IAGO GINÉ VÁZQUEZ wrote:
> > I have a chinese character on a data frame, but the output of 
> printing it is its UTF-8 code. Concretely, the character is 會 and the 
> code is U+6703. Following the code I arrive to the instruction
> >
> >> base::format.default("會")
> > which prints
> >
> > [1] "<U+6703>"
> >
> > I do not know which is the extent of this behaviour either if it 
> follows on most recent versions of R.
> >
> > Is it expected?
>
> If you are running this on Windows in an encoding where the character
> cannot be represented (e.g. non-Chinese locale), then yes, this is
> expected behavior.
>
> On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the
> character will be formatted/displayed properly.
>
> Best
> Tomas
>
> >
> > Thank you!
> >
> > Iago
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


	[[alternative HTML version deleted]]



More information about the R-devel mailing list