[R-pkg-devel] invalid multibyte string on solaris?
Tomas Kalibera
tom@@@k@||ber@ @end|ng |rom gm@||@com
Fri Nov 1 09:36:50 CET 2019
On 10/31/19 12:58 AM, Toby Hocking wrote:
> Hi all, I am getting an "invalid multibyte string" error from one of my
> examples when it is run on solaris, which results in check FAILURE:
> https://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/nc-00check.html
>
> To fix this I guess I could just delete this example, but is there any
> easy/known fix? I searched the r-devel and r-package-devel lists and I did
> not find any relevant threads.
>
> I also see that the same package on r-hub solaris is a check PASS:
> https://builder.r-hub.io/status/nc_2019.10.19.tar.gz-8b46d2a02a6340bcb313eeec96e404f3
>
> I was expecting that CRAN and r-hub solaris builds should report the same
> results. What could be the difference? is this a bug in CRAN or in r-hub?
The configuration of the CRAN check machine is given at
https://cran.r-project.org/web/checks/check_flavors.html#r-patched-solaris-x86
(see the Details section). I cannot reproduce the problem on a Solaris
machine I have access to (but it is yet a different configuration, so I
am not surprised). The problem is that during substring(), the C library
function mbrtowc() fails to convert a multi-byte coded string to a wide
character, which is needed to know how many bytes are used. I am not
sure why it fails without being able to reproduce, maybe the runtime
library does not support Emoji, but of course there can be a bug in R,
too. From the previous issue you have run into with Emoji, we know that
the machine (compiler runtime) does not declare that wchar_t is Unicode.
Clearly, by using Emoji you are stress-testing R, packages, external
libraries and the OS libraries, because these characters need surrogate
pairs in UTF-16 but a lot of old code was written before they even
existed, with all the problems of wchar_t.
Pragmatically, I would avoid using Emoji for these reasons in production
systems. If you, instead, wanted to stress test R or libraries to find
out where surrogate pairs were still not handled properly, it would be
better to look for reproducible examples on systems you have access to
and you can debug on your end. Some of these problems could be found
simply by code inspection as well, though. We could then fix at places
where it is easy or at least document in the code.
Best
Tomas
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
More information about the R-package-devel
mailing list