[R-pkg-devel] handling of byte-order-mark on r-devel-linux-x86_64-debian-clang machine

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Sat Mar 26 14:58:42 CET 2022


On Sat, 26 Mar 2022 11:34:00 +0000
Daniel Kelley <Dan.Kelley using Dal.Ca> wrote:

> This file starts with a byte-order-mark, and this is skipped over on
> all but the r-devel-linux-x86_64-debian-clang machine

Could you please explain how you came to this conclusion? I don't have
much experience with testthat, but looking at the recent results at
<https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/oce-00check.html>,
it seems that the whole header is mis-decoded as latin-1, not just the
BOM included in one field name.

Please correct me if my understanding is wrong, but I'm seeing you use
readLines to read the header of the file:

>> # encoding defaults to "UTF-8-BOM"
>> text <- readLines(file, 1, encoding=encoding, warn=FALSE)

The `encoding` argument of readLines() is documented as follows:

>> encoding: encoding to be assumed for input strings.  It is used to
>> mark character strings as known to be in Latin-1 or UTF-8: it is
>> not used to re-encode the input. To do the latter, specify the
>> encoding as part of the connection ‘con’ or via
>> ‘options(encoding=)’: see the examples.

It's unfortunate that you lack a clean way of reproducing the problem
(shouldn't it consistently fail on all glibc/libiconv/??? versions?),
but I think that the right thing to do here is to use

readLines(file(file, encoding = encoding), ...)

...and not the `encoding` argument of readLines. (See also: somewhat
confusing "Encoding" section in ?file.)

Taking another look at the check log, I see:

>> using session charset: ISO8859-15

Since readLines() seems to return text with Encoding(.) ==
'unknown' (i.e. native encoding) when it doesn't recognise its
`encoding` argument, I guess what happens here is that the UTF-8 text is
interpreted as ISO8859-15, and the same thing used to happen on
Windows, where the native encoding is the current ANSI code page. This
gives me a reason to hope that the test will start passing on Windows
too once you apply the fix.

-- 
Best regards,
Ivan



More information about the R-package-devel mailing list