[Rd] Encoding issues
Tomas Kalibera
tom@@@k@||ber@ @end|ng |rom gm@||@com
Mon Feb 18 17:45:14 CET 2019
On 2/18/19 4:36 PM, Iñaki Ucar wrote:
> Hi,
>
> We found a (to our eyes) strange behaviour that might be a bug. First
> a little bit of context. The 'units' package allows us to set the unit
> using both SE or NSE. E.g., these both work in the same way:
>
> units::set_units(1:10, "μm")
> #> Units: [μm]
> #> [1] 1 2 3 4 5 6 7 8 9 10
>
> units::set_units(1:10, μm)
> #> Units: [μm]
> #> [1] 1 2 3 4 5 6 7 8 9 10
>
> That's micrometers, and works fine if the session charset is UTF-8.
> Now the funny part comes with Windows. The first version, with quotes,
> works fine, but the second one fails. This is easy to demonstrate from
> Linux:
>
> LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, "μm")'
> #> Units: [μm]
> #> [1] 1 2 3 4 5 6 7 8 9 10
>
> LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, μm)'
> #> Error: unexpected input in "units::set_units(1:10, μ"
> #> Execution halted
>
> However, if you use the first version, with quotes, in an example, and
> the package is checked on Windows, it fails too (see
> https://ci.appveyor.com/project/edzer/units/builds/22440023#L747). The
> package declares UTF-8 encoding, so none of these errors should, in
> principle, happen. Am I wrong?
Hi Iñaki,
if you want to report a bug against R, please try to provide a minimum
reproducible example that only uses base packages (not units) and please
also see WRE sections 1.3, 1.6.3, including:
"There is a portable way to have arbitrary text in character strings
(only) in your R code, which is to supply them in Unicode as ‘\uxxxx’
escapes."
"If your package specifies an encoding in its DESCRIPTION file, you
should run these tools in a locale which makes use of that encoding"
(includes R CMD check)
Even though there are portable ways to have a string constant literal in
source code in UTF-8, not representable in the current native encoding
(e.g. using \u escapes), it does not mean that such a string can be
freely used in R. Many operations require conversion to the current
native encoding, which will cause an error or unexpected result. Such
conversions can happen any time (except when they are documented not to
happen).
Implementing an API that will work with such strings in a package would
be hard to get right, but not impossible. NSE will not work
(non-representable strings, which are not string constant literals, are
not supported). One can save a lot of headaches by using only ASCII in
function APIs.
Best
Tomas
>
> Thanks in advance, regards,
> Iñaki
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
More information about the R-devel
mailing list