[R-pkg-devel] Good practice for database with utf-8 string in package

Maëlle SALMON m@e||e@@@|mon @end|ng |rom y@hoo@@e
Fri Sep 17 14:21:45 CEST 2021


You could also try to submit the package to CRAN with a comment about the NOTE. There is interesting information in https://discuss.ropensci.org/t/note-on-utf-8-strings-by-goodpractice-gp/2165/

Good luck!

Ma\\u00eblle






Den fredag 17 september 2021 13:01:25 CEST, Enrico Schumann <es using enricoschumann.net> skrev: 





On Fri, 17 Sep 2021, Marc Girondot via R-package-devel writes:

> I have posted this question first to r-help using r-project.org and Bert Gunter informs me that it was better for this discussion list that I didn't know.
>
> Hello everyone,
>
> I am a little bit stucked on the problem to include a database with
> utf-8 string in a package. When I submit it to CRAN, it reports NOTES
> for several Unix system and I try to find a solution (if it exists) to
> not have these NOTES.
>
> The database has references and some names have non ASCII characters.
>
> * First I don't agree at all with the solution proposed here:
>
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues
>
> "First, consider carefully if you really need non-ASCIItext."
>
> If a language has non ASCII characters, it is not just to make the
> writting nicer of more complex, it is because it changes the prononciation.
>
> * Then I try to find solution to not have these NOTES.
>
> For example, here is a reference with utf-8 characters
>
>> DatabaseTSD$Reference[211]
>
> [1] Hernández-Montoya, V., Páez, V.P. & Ceballos, C.P. (2017) Effects of
> temperature on sex determination and embryonic development in the
> red-footed tortoise, Chelonoidis carbonarius. Chelonian Conservation and
> Biology 16, 164-171.
>
> When I convert the characters into unicode, I get indeed only ASCII
> characters. Perfect.
>
>>  iconv(DatabaseTSD$Reference[211], "UTF-8", "ASCII", "Unicode")
>
> [1] "Hern<U+00E1>ndez-Montoya, V., P<U+00E1>ez, V.P. & Ceballos, C.P.
> (2017) Effects of temperature on sex determination and embryonic
> development in the red-footed tortoise, Chelonoidis carbonarius.
> Chelonian Conservation and Biology 16, 164-171."
>
> Then I have no NOTES when I checked the package with database in UNIX...
> but how can I print the reference back with original characters ?
>
> Thanks a lot to point me to best practices to include databases with
> non-ASCII characters and not have NOTES while submitted package to CRAN.
>
> Marc
>

WRE in section 1.1.5 says:

  "Any byte will be allowed in a quoted character string but ‘\uxxxx’
    escapes should be used for non-ASCII characters. However, non-ASCII
    character strings may not be usable in some locales and may display
    incorrectly in others."

So you could try to use such escapes, e.g.

    stringi::stri_escape_unicode("Hernández-Montoya")
    ## [1] "Hern\\u00e1ndez-Montoya"


-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net


______________________________________________
R-package-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



More information about the R-package-devel mailing list