[Rd] How to get utf8 string using R externals

xiaoyan yu x|@oy@n@yu @end|ng |rom gm@||@com
Thu Jun 3 17:39:36 CEST 2021


Thanks! I tried my C++ program based on R externals and the same R script
and found the results shown are the desired glyph.
Hence this is R windows specific problem.


On Wed, Jun 2, 2021 at 9:08 PM brodie gaslam <brodie.gaslam using yahoo.com>
wrote:

>
> > On Wednesday, June 2, 2021, 7:58:54 PM EDT, xiaoyan yu <
> xiaoyan.yu using gmail.com> wrote:
> >
> > I am using gmail. Not sure of the configuration of plain text.
> > The memory pointed by the char * as the output of Rf_translateChar() is
> > actually the string "<U+BD80><U+C2E4>".
>
> Hi Xiaoyan,
>
> Unfortunately I'm not super familiar with R on Windows, but I think
> I can provide a simpler reproducible example.  In Rgui, if I type "\UBD80"
> at the prompt and hit enter, I see the desired glyph.  In Rterm I see the
> unicode escape.
>
> IIRC the capabilities of Rterm and Rgui are different, and UTF8 support
> in windows is limited.  Tomas Kalibera discusses this in some detail:
>
>
> https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
>
> In terms of `Rf_translateChar()`, presumably the `Riconv` call is failing
> on Rterm, but not on Rgui:
>
> https://github.com/r-devel/r-svn/blob/master/src/main/sysutils.c#L924
>
> I'm guessing, but that would explain why the C level string is in that
> format.  I don't know why the string would translate in Rgui though.  My
> guess is that it did not as even in Rgui the following:
>
>     enc2native("\uBD80")
>
> Produces the escaped version of the string.
>
> As others have suggested you could try the experimental UCRT Windows
> release:
>
>
> https://developer.r-project.org/Blog/public/2021/03/12/windows/utf-8-toolchain-and-cran-package-checks/index.html
>
> Install instructions (focus on Binary installer):
>
>
> https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html
>
> If I try UCRT on my system this no longer produces the escape:
>
>     enc2native("\uBD80")
>
> Although all I see is a question mark.  My guess is that my code page or
> something similar is not set right.  Examining with `charToRaw` reveals
> the string remains in UTF-8 encoding.
>
> Aside: it's not clear to me that you need to translate the string if your
> intent is for it to remain UTF-8.  You just don't seem to be set-up to
> interpret UTF-8 strings currently.
>
> Best,
>
> B
>
> > On Wed, Jun 2, 2021 at 6:09 PM David Winsemius <dwinsemius using comcast.net>
> > wrote:
> >
> >> First; you should configure yopu mail client to send plain text.
> >>
> >> Can you explain what is meant by:
> >>
> >> the characters are unicodes (<U+BD80><U+C2E4>) instead of
> >> utf8 encoding of the korean characters 부실.
> >>
> >> As far as I can tell those two unicodes _are_ the utf8 encodings of 부실.
> >>
> >> You may need to consult a couple of R help pages. I suggest:
> >>
> >> ?Quotes
> >> ?points  # has examples of changing fonts used for display on console.
> >>
> >> Sorry if I've misunderstood. I'm not on a Windows device, so  posting
> the
> >> C++ program won't be helpful, but maybe it would for other prospective
> >> respondents.
> >>
> >> --
> >> David.
> >>
> >> On 6/2/21 1:33 PM, xiaoyan yu wrote:
> >> > I have a R Script Predict.R:
> >> >      set.seed(42)
> >> >      C <- seq(1:1000)
> >> >      A <- rep(seq(1:200),5)
> >> >      E <- (seq(1:1000) * (0.8 + (0.4*runif(50, 0, 1))))
> >> >      L <- ifelse(runif(1000)>.5,1,0)
> >> >      df <- data.frame(cbind(C, A, E, L))
> >> > load("C:/Temp/tree.RData")                #  load the model for
> scoring
> >> >
> >> >    P <- as.character(predict(tree_model_1,df,type='class'))
> >> >
> >> > Then in a C++ program
> >> > I call eval to evaluate the script and then findVar the P variable.
> >> > After get each class label from P using string_elt and then
> >> > Rf_translateChar, the characters are unicodes (<U+BD80><U+C2E4>)
> instead
> >> of
> >> > utf8 encoding of the korean characters 부실.
> >> > Can I know how to get UTF8 by using R externals?
> >> >
> >> > I also found the same script giving utf8 characters in RGui but
> unicode
> >> in
> >> > Rterm.
> >> > I tried to attach a screenshot but got message "The message's content
> >> type
> >> > was not explicitly allowed"
> >> > In RGui, I saw the output 부실, while in Rterm, <U+BD80><U+C2E4>.
> >> >
> >> > Please help.
> >> >
> >> >      [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-devel using r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >>
> >
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list