[R-pkg-devel] invalid multibyte string on solaris?

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Fri Nov 1 09:36:50 CET 2019


On 10/31/19 12:58 AM, Toby Hocking wrote:
> Hi all, I am getting an "invalid multibyte string" error from one of my
> examples when it is run on solaris, which results in check FAILURE:
> https://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/nc-00check.html
>
> To fix this I guess I could just delete this example, but is there any
> easy/known fix? I searched the r-devel and r-package-devel lists and I did
> not find any relevant threads.
>
> I also see that the same package on r-hub solaris is a check PASS:
> https://builder.r-hub.io/status/nc_2019.10.19.tar.gz-8b46d2a02a6340bcb313eeec96e404f3
>
> I was expecting that CRAN and r-hub solaris builds should report the same
> results. What could be the difference? is this a bug in CRAN or in r-hub?

The configuration of the CRAN check machine is given at 
https://cran.r-project.org/web/checks/check_flavors.html#r-patched-solaris-x86 
(see the Details section). I cannot reproduce the problem on a Solaris 
machine I have access to (but it is yet a different configuration, so I 
am not surprised). The problem is that during substring(), the C library 
function mbrtowc() fails to convert a multi-byte coded string to a wide 
character, which is needed to know how many bytes are used. I am not 
sure why it fails without being able to reproduce, maybe the runtime 
library does not support Emoji, but of course there can be a bug in R, 
too. From the previous issue you have run into with Emoji, we know that 
the machine (compiler runtime) does not declare that wchar_t is Unicode.

Clearly, by using Emoji you are stress-testing R, packages, external 
libraries and the OS libraries, because these characters need surrogate 
pairs in UTF-16 but a lot of old code was written before they even 
existed, with all the problems of wchar_t.

Pragmatically, I would avoid using Emoji for these reasons in production 
systems. If you, instead, wanted to stress test R or libraries to find 
out where surrogate pairs were still not handled properly, it would be 
better to look for reproducible examples on systems you have access to 
and you can debug on your end. Some of these problems could be found 
simply by code inspection as well, though. We could then fix at places 
where it is easy or at least document in the code.

Best
Tomas

> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



More information about the R-package-devel mailing list